Stable nanoscale nucleic acid assemblies and methods thereof

ABSTRACT

Methods for the top-down design of nucleic acid nanostructures of arbitrary geometry based on target shape of spherical or non-spherical topology are described. The methods facilitate 3D molecular programming of lipids, proteins, sugars, and RNAs based on a DNA scaffold of arbitrary 2D or 3D shape. Geometric objects are rendered as node-edge networks of parallel nucleic acid duplexes, and a nucleic acid scaffold routed throughout the network using a spanning tree formula. Nucleic acid nanostructures produced according to top-down design methods are also described. In some embodiments, the nanostructures include single-stranded nucleic acid scaffold, DX crossovers, and staple strands. In other embodiments, the nanostructures include single-stranded nucleic acid scaffold, PX crossovers and no staples. Modified nanostructures include chemically modified nucleotides and conjugated to other molecules are described.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of and priority to U.S. Ser. No.62/328,442 filed Apr. 27, 2016 and U.S. Ser. No. 62/328,450 filed Apr.27, 2016, the contents of which are incorporated by reference in theirentirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

This invention was made with government support under Grant Nos.N00014-14-1-0609 and N00014-16-1-2181 awarded by the Office of NavalResearch, under Grant No. CMMI-1334109 awarded by the National ScienceFoundation, under Grant No. RGP0029/2015 awarded by the Human FrontierScience Program (HFSP), and under Grant No. CCF-1547999 awarded by theNational Science Foundation (NSF-EAGER). The government has certainrights in the invention.

REFERENCE TO SEQUENCE LISTING

The Sequence Listing submitted Apr. 27, 2017, as a text file named“MIT_18588_PCT_ST25.txt,” created on Apr. 27, 2017, and having a size of7,408 bytes is hereby incorporated by reference pursuant to 37 C.F.R. §1.52(e)(5).

FIELD OF THE INVENTION

The present invention relates to the design of arbitrary 2D or 3Dgeometries using nucleic acids, and in particular to the design ofnucleic acid nanostructures having a desired geometric form in order tomimic and/or reproduce existing natural macromolecular assemblies, aswell as synthesize entirely new ones.

BACKGROUND OF THE INVENTION

DNA nanotechnology offers the unique ability to synthesize highlystructured nanometer-scale assemblies that in principle could rival thegeometric complexity found in natural protein and nucleic acidassemblies. The past decade has witnessed dramatic growth in thediversity of structured DNA assemblies that can be programmed from thebottom-up to self-assemble into target shapes using complementaryWatson-Crick base pairing (Seeman, N C et al., Biophys. J. 44, 201-209(1983); Rothemund, P W K, Nature, 440, 297-302 (2006); Rothemund, P W,Nanotechnology: Science and Computation, 3-21 (2006); He, Y et al.,Nature. 452, 198-201 (2008); Jones, M R et al., Science. 347, 1260901(2015); Zhang, F et al., Nat. Nanotechnol. 10, 779-784 (2015)).Scaffolded DNA origami is a particularly powerful means of synthesizingstructured DNA assemblies, offering full control over both molecularweight and intricate nanometer-scale structure, with quantitative yieldof the programmed product that relies on a single-stranded DNA template(Rothemund, P W K, Nature. 440, 297-302 (2006); Zhang, F et al., Nat.Nanotechnol. 10, 779-784 (2015); Dunn, K E et al., Nature. 525, 82-86(2015); Shih, W M et al., Nature. 427, 618-621 (2004); Martin, T G etal., Nat. Commun. 3, 1103 (2012); Han, D et al., Science. 332, 342-346(2011); Ko, et al., Nature Chemistry, 2, pp. 1050-1055, (2010); Han, Det al., Science. 339, 1412-1415 (2013); Castro, C E et al., Nat.Methods. 8, 221-229 (2011)). Wireframe topologies based on thescaffolding principle have further demonstrated highly versatile controlover 2D and 3D spatial architecture (Zhang, F et al., Nat. Nanotechnol.10, 779-784 (2015); Yan, H et al., Science. 301, 1882-1884 (2003); Yan,H et al., Science. 301, 1882-1884 (2003); Seeman, et al., Nano Lett.,V1, (1) 22-26 (2001); Spink, et al., Biophysical Journal, V 97, 528-538(2009); Wang, PNAS, 9. 107 (28), pp. 12547-12552 (2010); Gu, et al.,Nature Nanotechnology, 4, 245-248 (2009)).

Similar to the challenge of structure-based protein sequence design,which seeks to infer the amino acid sequence needed to fold a targetprotein structure of interest (Sinclair, J C et al., Nat. Nanotechnol.6, 558-562 (2011); Gradišar, H et al., Nat. Chem. Biol. 9, 362-366(2013)), achieving a general strategy for structure-based design of DNAassemblies represents a major challenge as well as opportunity fornanotechnology. While numerous computational design tools exist to aidin the bottom-up, manual programming of scaffolded DNA origami, whichrequires complex scaffold routing and staple design to realize a targetgeometry based on Watson-Crick base complementarity, only one approachoffers a solution to the inverse problem of sequence design based onspecification of target geometry (Benson, E, et al., Nature. 523,441-444 (2015)). However, this approach is only semi-automated andrelies on single duplex DNA arms and multi-junctions to representpolyhedral geometries, which may result in compliant and unstableassemblies that are unsuitable for many applications. Moreover,programmed geometries must be topologically equivalent to a sphere,significantly limiting its scope.

Therefore, it is an object of the invention to provide a method toidentify combinations of nucleic acid sequences and oligonucleotideprimers that can be combined to produce nanoscale assemblies of nucleicacids including DNA or RNA or DNA/RNA hybrid assemblies.

It is also an object of the invention to provide methods to identifycombinations of nucleic acid sequences without oligonucleotide primersthat alone can produce nanoscale DNA or RNA assemblies.

It is also an object of the invention to provide fully automated inversesequence design of nanoscale DNA or RNA or DNA/RNA hybrid assemblies.

It is also an object of the invention to provide arbitrary wireframe DNAassemblies without any limitation to spherical topologies.

It is also an object of the invention to provide arbitrary wireframe DNAassemblies with any even number of DNA or RNA or DNA/RNA hybrid helicesper edge, or the geometry of the edge.

It is a further object of the invention to provide structurally stable,rigid nanoscale DNA or RNA or DNA/RNA hybrid assemblies.

It is a further object of the invention to provide methods of designingrigid 3D nucleic acid scaffolds that can be used to pattern in 3D spacearbitrary organizations of secondary molecules that either bind directlyto the nucleic acid sequence, or are covalently or non-covalentlyattached to nucleic acid bases or sugars, such as proteins, peptides,aptamers, lipids, sugars, RNAs, PNAs, etc., based on top-downspecification of 3D nanoparticle shape and size.

It is also an object of the invention to provide a method to generatesingle-stranded DNA of arbitrary length, sequence, and modifiedcomposition that can assemble into a 3D structure having a desired shapeand size.

SUMMARY OF THE INVENTION

Methods for the automated design of nucleic acid nanostructure havingarbitrary geometries and scaffold sequences have been developed. Themethods generate single-stranded nucleic acid sequences of arbitrarylength, with or without chemical modifications, and define rules forsequence space that optimizes product yield. An exemplary nucleic acidscaffold sequence is single-stranded DNA.

In some embodiments, methods for designing a nucleic acid nanostructurehaving a geometric shape include the steps of determining the geometricparameters of an input and identifying a route for a single-strandednucleic acid scaffold that traces throughout the geometric shape, thengenerating the sequences of the single-stranded nucleic acid scaffoldand optionally the nucleic acid sequence of staple strands that combineto form a nucleic acid nanostructure having the geometric shape.

The methods enable the top-down design of nanostructures formed fromnucleic acids, based on the geometry of a desired target shape.Typically, the methods provide the nucleic acid sequences required toform a three-dimensional structure corresponding to a desired geometricform. The methods require only the geometric parameters that define thedesired structure as input, and enable the user to optionally defineadditional parameters, such as the physical size and nucleic acidpolymer scaffold sequence. The methods can be computer-based. In someembodiments, output is in the form of a single-stranded nucleic acidpolymer that is a scaffold sequence routed throughout every edge of thenanostructure, and one or more oligonucleotide staple sequences thathybridize to the scaffold sequence to provide a double-stranded nucleicacid structure having the desired form with the desired edge typecomposed of an even number of DNA or RNA helices. In some embodiments,output is in the form of a single-stranded nucleic acid polymer that isa scaffold sequence that is routed several times throughout every edgeof the nanostructure providing a double-stranded nucleic acid structureof the desired form without the need for staples, or as few staples asdesired, by allowing self-hybridization.

Methods of assembling and purifying nucleic acid nanoparticles are alsoprovided. In some embodiments, the methods include one or more steps toalter the chemical or structural properties of the nucleic acidnanoparticles.

Therefore, methods of functionalizing nucleic acid nanoparticles areprovided. In some embodiments, methods of functionalizing nucleic acidnanoparticles include one or more steps that alter the chemical orstructural properties of the assembled nucleic acid nanoparticles. Insome embodiments, the methods of functionalizing nucleic acidnanoparticles include one or more steps that alter the chemical orstructural properties of the nucleic acid scaffold prior to assembly ofthe nucleic acid nanoparticles. In some embodiments, the methods offunctionalizing nucleic acid nanoparticles include one or more stepsthat alter the chemical or structural properties of the nanoparticlesthrough chemical or structural modifications of oligonucleotide staplestrands. In an exemplary embodiment, the methods of functionalizingnucleic acid nanoparticles include extension of the oligonucleotidestaple strands to produce single-stranded DNA emanating at preciselocations on the structure. Therefore, nanostructures can be designedand assembled to mimic biological structures such as virus capsids,toxins, protein assemblies, lipid and sugar organizations, and can beused for applications such as delivery, immune stimulation for vaccines,complexing with proteins or RNA, and sensing.

Typically, the methods include the steps of (a) selecting a desired 2Dor 3D form as a target structure; (b) providing geometric parameters andphysical dimensions of the target structure; (c) identifying the routeof a single-stranded nucleic acid scaffold that traces throughout theentire target structure; and (d) determining the sequences of thesingle-stranded nucleic acid scaffold and the nucleic acid sequence ofstaple strands that combine to form a nucleic acid nanostructure havingthe desired shape. In some embodiments, the target structure is astructure that does not have spherical topology.

The step of providing the geometric parameters and physical dimensionsof the target polyhedral structure further can include providing atemplate nucleic acid scaffold sequence. For example, in someembodiments, the methods include providing the length of one or more ofthe edges spanning two vertices of the target polyhedral structure.Preferably, the length of each edge is at least 31 base pairs.

In some embodiments, DNA is used as a scaffold. When DNA is a scaffold,the length of each edge can be expressed as a multiple of 10.5 basepairs, rounded up, or rounded down to the nearest whole number. In anexemplary embodiment, the length of each edge is 31 base pairs, 32 basepairs, 42 base pairs, 52 base pairs, 53 base pairs, 63 base pairs, 73base pairs, or more than 73 base pairs. In other embodiments, RNA isused as a scaffold. When RNA is used as a scaffold, the length of eachedge can be expressed as a multiple of 11 base pairs. In an exemplaryembodiment, the length of each edge is 33 base pairs, 44 base pairs, 55base pairs, 66 base pairs, 77 base pairs, or more than 77 base pairs.

Typically, the geometric parameters provided as input include vertex,face, and edge information, for example, as determined from a polyhedralwire-mesh model of the target shape.

The route of a single-stranded nucleic acid scaffold that tracesthroughout the entire target structure is typically identified by amethod including: (i) producing a node-edge network representing thethree-dimensional structure; (ii) determining a spanning tree of thenetwork corresponding to the three-dimensional structure, for example,where the vertices and lines of the structure are the nodes and edges ofthe network, respectively; (iii) classifying each edge as having adouble stranded scaffold crossover, if it is not a member of thespanning tree, or not having one if it is a member of the spanning tree;(iv) splitting the edges that are not members of the spanning tree intotwo edges, each containing a pseudo-node at the point of the scaffoldcrossover and each node at each of the vertices being split into twopseudo-nodes; and (v) determining the route of a single-stranded nucleicacid scaffold that traces once along each edge in both directionsthroughout the entire target structure from a Euler cycle of thenetwork. In some embodiments, the planar representation of the graph ofthe three-dimensional structure for aiding visualization is the Schlegeldiagram of the three-dimensional structure. In preferred embodiments,the spanning tree of the network is a branched spanning tree, such as abreadth-first spanning tree. In a particular embodiment, the spanningtree is determined using Prim's formula. Typically, the Euler circuitcalculated from the spanning tree is an A-trail Euler circuit.

In some embodiments, DNA is used as a scaffold and the length of eachedge is expressed as a multiple of 10.5 base pairs, rounded up, orrounded down to the nearest whole number. Further, the cross-section ofthe edge is chosen to be composed of 2 helices per edge, 4 helices peredge, 6 helices per edge, 8 helices per edge, 10 helices per edge, orgreater than 10 helices per edge. In an exemplary embodiment, the lengthof each edge is 31 base pairs, 42 base pairs, 52 base pairs, 63 basepairs, 73 base pairs, 84 base pairs, or more than 84 base pairs with 4helices bundled in parallel on a square lattice, 6 helices bundled inparallel on a honeycomb lattice, 6 helices bundled in parallel on asquare lattice, 10 helices bundled in parallel on a honeycomb lattice,or more than 6 or 10 helices in parallel along a square or honeycomblattice.

In some embodiments, DNA is used as a scaffold and the length of eachedge is expressed as an integer number of nucleotides. The edge type canbe of 2 helices, 4 helices, 6 helices, or more than 6 helices arrangedin parallel along a square lattice, or a honeycomb lattice. These edgescan be arranged in a closed or open wireframe structure in 3D or in a 2Dwireframe grid, having a planar, spherical, or non-spherical topology.The intersection of the edges at a vertex can be extended in length bythe amount necessary to bring the helices precisely together, similar toa beveled edge in woodworking. The additional distance from the helicescoming together are spanned by the number of double-helical nucleotides.

In some embodiments, DNA is used as a scaffold that is traced more thantwo times crossing the vertex, which depends on the number of helixes onthe edge. For example, in the case of the 3-arm junction, the threeedges are connected to each other by three single strands with DNAtraced two times through each vertex, and with nine single strands withthis approach.

In some embodiments, DNA is used as a scaffold that is traced more thanonce along the edges of the structure and eliminates the necessity forsome or all oligonucleotides to fold.

The route of a single-stranded nucleic acid scaffold that tracesthroughout the entire target structure and can hybridize to itself istypically identified by a method including: (i) producing a node-edgenetwork representing the three-dimensional structure; (ii) determining aspanning tree of the network corresponding to the three-dimensionalstructure, for example, where the vertices and lines of the structureare the nodes and edges of the network, respectively; (iii) classifyingeach edge as one of four types, based on its membership in the spanningtree and the crossover motif employed: if it is not a member of thespanning tree, each fragment of the scaffold exits the edge from thevertex it starts from, if it is a member of the spanning tree, eachfragment of the scaffold exits the edge from the vertex it did not startfrom, and each edge can employ either anti-parallel or parallelcrossover motifs; (iv) splitting the edges that are not members of thespanning tree into two edges, each containing a pseudo-node at the pointof the scaffold crossover and each node at each of the vertices beingsplit into two pseudo-nodes; and (v) determining the route of asingle-stranded nucleic acid scaffold that traces throughout the entiretarget structure from the Eulerian cycle of the network by superimposingand connecting units of partial scaffold routing within an edge based onits classification and length.

In some embodiments, the methods include the step of predicting thethree-dimensional structure of the nucleic acid nanostructure. In someembodiments, the methods include the step of making the nucleic acidnanostructure. Therefore, in certain embodiments, the methods includethe steps of predicting the three-dimensional structure and making thenucleic acid nanostructure. When the methods include predicting thethree-dimensional structure and making the nucleic acid nanostructure,the methods optionally include the step of validating the nucleic acidnanostructure. For example, in some embodiments the nucleic acidnanostructure is validated by comparison with the predictedthree-dimensional structure.

In some embodiments, asymmetric polymerase chain reaction (aPCR) is usedto synthesize long single-stranded DNA used as a scaffold. Typically, anaPCR reaction is composed of two primers flanking the region of interestto be amplified, a template DNA to replicate from, buffers, nucleotides,and polymerase enzymes, where one of the primers is in excess over theother. In some embodiments, one primer is in 50- or 65-fold molar excessover the second primer. In some embodiments, the length of the scaffoldis 500 nucleotides in length; 1000 nucleotides in length; 1500nucleotides in length; 2000 nucleotides in length; 2500 nucleotides inlength; 3281 nucleotides in length; 10,000 nucleotides in length; 12,000nucleotides in length; or greater than 12,000 nucleotides. Typically,Taq-based polymerases or commercial blend of enzymes [LONGAMP®] are usedas the enzyme in aPCR. In some embodiments, the nucleic acid polymer canbe modified by introduction of modified nucleotides into the solution,including fluorescent nucleotides, radio-labeled nucleotides,alternative bases, and modified backbone. In an exemplary embodiment,alternative nucleotides are used in the DNA polymer synthesis withnucleotides modified with Cy5 fluorophore-modified nucleotides,phosphorothioate-modified nucleotides, and deoxyuridines. In anotherexemplary embodiment, modified primers including additional 5′ sequencesto add to the amplicons is used to increase or modify the ssDNA finalproduct or to hybridize to other ssDNA produced by standard synthesis orthrough aPCR. In another exemplary embodiment, the primers can bephosphorylated for ligation.

Compositions of polyhedral nucleic acid nanostructures designedaccording to the described methods are also provided. In one embodiment,the polyhedral nucleic acid nanostructures include two nucleic acidanti-parallel helices spanning each edge of the structure. In anotherembodiment, the polyhedral nucleic acid nanostructures include 4, 6, 8,or more than 8 anti-parallel helices spanning each edge of thestructure. The three-dimensional structure is formed from singlestranded nucleic acid staple sequences hybridized to a single strandednucleic acid scaffold sequence. The scaffold sequence is routed througha Eulerian cycle of the network defined by the vertices and lines of thepolyhedral structure. The locations of double-stranded crossovers aredetermined by the spanning tree of the polyhedral structure. The staplesequences are hybridized to the vertices, edges and double-strandedcrossovers of the scaffold sequence to define the shape of thenanostructure. In some embodiments, the polyhedral nucleic acidnanostructures include 2 or more than 2 parallel helices spanning eachedge of the structure. The three-dimensional structure is formed fromsingle stranded nucleic acid sequences hybridized to itself. Thescaffold sequence is routed through the Eulerian cycle of the networkdefined by the vertices and edges of the polyhedral structure. In otherembodiments, the polyhedral nucleic acid nanostructures include acombination of 2 or more than 2 parallel or anti-parallel helicesspanning each edge of the structure. In some embodiments, the polyhedralnucleic acid nanostructure further includes one or more of atherapeutic, diagnostic or prophylactic agent, or combinations. Forexample, in some embodiments the nanostructures encapsulates one or moretherapeutic, diagnostic or prophylactic agent. In other embodimentssecondary molecules are either covalently or non-covalently attached tothe DNA structural scaffold or oligonucleotides with resulting fullcontrol over their 3D organization. In an exemplary embodiment messengerRNA (mRNA) encoding a protein is non-covalently attached to the DNAnanostructure using single-stranded DNA extensions from theoligonucleotides and complementary to the mRNA. In another exemplaryembodiment, the gene editing protein Cpf1 is attached to the DNAnanostructure using double-stranded DNA duplex attached tooligonucleotides and passing through the structure. In another exemplaryembodiment, the gene editing protein Cpf1 is attached to the DNAnanostructure using single-stranded DNA extensions from theoligonucleotides and complementary to 3′ extensions of a crRNA loaded toCpf1. This approach can be generalized to include alternative geneediting proteins and DNA-interacting proteins including Cas9, taleffectors, and zinc fingers.

Methods of using the polyhedral nucleic acid nanostructures for thedelivery of a therapeutic, diagnostic or prophylactic agent to asubject, or using nucleic acid nanostructures as platforms for syntheticvaccines are also provided.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is schematic illustrating the workflow for top-down sequencedesign of arbitrary polyhedral DNA origami objects. The scheme depicts:the specified target polyhedral shape; rendering the polyhedral geometryas a node-edge network based on an arbitrary closed surfacerepresentation, which is used to create a Schlegel diagramrepresentation and to generate a spanning tree; the spanning tree isused as input to the formula that generates a scaffolding route, fromwhich the routing scheme is determined; the oligonucleotide sequence ofthe single-stranded scaffold and staple assignment is determined; threedifferent regions depicted as boxes (1, 2 and 3) represent threedifferent types of staple strand, at the vertices, edges and DXcrossovers, respectively; the atomic structure is generated assumingcanonical B-form DNA geometry, and compared with 3D reconstruction fromcryo-electron microscopic imaging.

FIGS. 2A and 2B are charts representing atomic model structures (FIG.2A), and face-shaded models (FIG. 2B), respectively for each of severalpolyhedral shapes. Each shape is appended with a number, 1-45,respectively. The same numbering scheme used in FIG. 2A and FIG. 2B isalso used in subsequent figures. The polyhedral shapes include Platonic(1-5), Archimedean (6-15), Johnson (16-25), Catalan (26-35), and otherpolyhedra (36-45), generated using the top-down design procedure.Miscellaneous polyhedra include (first column) heptagonal bipyramid(36); enneagonal trapezohedron (37); small stellated dodecahedron (38),a type of Kepler-Poinsot solid; rhombic hexecontahedron (39), a type ofzonohedron; Goldberg polyhedron with symmetry of Papillomaviridae (40);(second column) double helix (41); nested cube (42); nested octahedron(43); torus (44); and double torus (45). Single-stranded poly-T loopsare omitted. Platonic, Archimedean, and Johnson solids each have 52-bpedge length, Catalan solids and the first column of other polyhedra haveminimum 42-bp edge length, and the second column of other polyhedra haveminimum 31-bp edge length. Objects are not drawn to scale.

The same numbering scheme used in FIG. 2A and FIG. 2B is also used inFIGS. 4A-4H and FIG. 13 .

FIGS. 3A-3N are schematic illustrations, each showing a step in thetop-down sequence design strategy for a tetrahedron (shape number 1 inFIG. 2A). FIG. 3A is an input tetrahedral shape shown as a mesh havingvertices, edges, and faces. FIG. 3B is a 2D projection of FIG. 3A, thatis a Schlegel diagram. FIGS. 3C and 3D each show one of two spanningtrees which will lead to a valid scaffold routing. The default is togenerate the most-branching tree (FIG. 3C), and an alternative tree isalso given (FIG. 3D). FIGS. 3E and 3F show the spanning trees of FIGS.3C and 3D, respectively, with the edges that are not members of thespanning tree broken in two to define the scaffold crossovers. FIGS. 3Gand 3H show the vertices of FIGS. 3E and 3F split according to eachspanning tree. FIGS. 3I and 3J show the scaffold routes of each spanningtree of FIGS. 3G and 3H. FIGS. 3K and 3L show define the Euleriancircuit of each of two possible routes of FIGS. 3C and 3D. FIG. 3M is aschematic illustrating an example of how scaffold crossover positionsare determined for a given edge length in basepairs (bp). FIG. 3N is anenlarged view of FIG. 3K, showing vertex staples (48) and edge staples(50), scaffold nick position (52) and scaffold polarity (arrow 54).

FIGS. 4A-4D depict DNA nanostructures with non-spherical topologiesshown as face-shaded models. FIGS. 4E-4H depict predicted computationalmodels of atomic structures for the corresponding scaffolded DNA origamiobjects, respectively. The objects are nested cube (42) (FIG. 4A, FIG.4E); nested octahedron (43) (FIG. 4B, FIG. 4F); torus (44) (FIG. 4C,FIG. 4G); and double torus (45) (FIG. 4D, FIG. 4H).

FIGS. 5A-5F show each of six different vertices for predicting 3Dstructures, for each of three four-way junctions (FIGS. 5A-5C), andthree three-way junctions (FIGS. 5D-5F), respectively. The inradius ofeach vertex is depicted as a circle. The beginning of the DX-tile edgeis shown as a rectangle, the edges represented by dotted lines, whichmeet at a single point. FIG. 5A and FIG. 5D show vertices with equalface angles. FIG. 5B and FIG. 5E show vertices with unequal face anglesdefined using backbone stretches. FIG. 5C and FIG. 5F show vertices withunequal face angles defined using nucleotide overlap.

FIGS. 6A and 6B are schematics illustrating methods used to produceshort ssDNA scaffold strands from M13mp18 ssDNA plasmid, by enzymaticdigestion (FIG. 6A), and aPCR amplification (FIG. 6B).

FIGS. 7A-7D depict the custom scaffold sequence routes determined byeach of four different Eulerian circuits corresponding to four differentpolyhedral structures (FIGS. 7A-7D), respectively. The size of eachnucleic acid scaffold in bases (b) is shown beneath each depiction.

FIG. 7E is a schematic diagram of object-specific scaffold synthesisusing asymmetric PCR. The length for the target structure is identifiedfor amplification using either single- or double-stranded DNA (DNAtemplate) mixed with appropriate (aPCR) primer pairs, with 50× senseprimer and 1× anti-sense primer concentration relative to scaffold.

FIGS. 8A-8B are schematics depicting the folding of a tetrahedron usinga full M13mp18 scaffold (FIG. 8A) or a short ssDNA scaffold (FIG. 8B),respectively.

FIGS. 9A-9H are schematic representations of different possible vertexstaple designs, as characterized on a tetrahedral nucleic acidnanostructure having an isolated vertex, with edge-length of 52-bp.Proposed vertex staples designs with a single cross-over are depictedfor each of a 3-way junction (FIGS. 9A, 9E), 4-way junction (FIG. 9B),5-way junction (FIG. 9C), and 6-way junction (FIG. 9D). FIGS. 9E-9Hdepict four different types of vertex staple design for a 3-wayjunction; vertex staples designs are depicted for each of singlecross-over (FIG. 9E), one nick (FIG. 9F) two-nick (FIG. 9G) and withoutpoly-T (FIG. 9H) staple designs, respectively.

FIGS. 9I-9L are line graphs of qPCR data. FIG. 9I shows fluorescenceintensity (a.u.) over temperature (° C.) for each of four differentvertex staple designs on isolated DX tiles, including single cross-overmodel (sx); one nick model (N1); two-nick model (N2); and without poly-Tmodel (WOT); designs, respectively. FIG. 9J shows Df/Dt over temperature(° C.) of data in 9I for each vertex staple design, including singlecross-over model (sx); one nick model (N1); two-nick model (N2); andwithout poly-T model (WOT); designs, respectively. FIG. 9K showsfluorescence intensity (a.u.) over temperature (° c.) for each of threedifferent vertex staple designs on a tetrahedral structure withedge-length of 52-bp, including single cross-over model (sx); two-nickmodel (N2); and without poly-T model (WOT); designs, respectively. FIG.9L shows Df/Dt over temperature (° C.) for each vertex staple design,including single cross-over model (sx); two-nick model (N2); and withoutpoly-T model (WOT); designs, respectively.

FIGS. 10A-10B are schematics illustrating characterization of DNAorigami folding in variable salt and stability in physiological bufferand serum. FIG. 10A is a schematic depicting folding of synthesizedDX-based objects in increasing magnesium chloride (MgCl₂) and sodiumchloride (NaCl) concentration. FIG. 10B is a schematic depictingcharacterization of the stability of synthesized DX-based objects inTAE-Mg (12 mM MgCl₂) buffer after exchange buffer and 6 hours in PBS,TAE (without added NaCl or MgCl₂), or DMEM buffer with increasingconcentration of FBS (0, 2, and 10%).

FIG. 11 is a graph showing fluorescence (A.U) over wavelength (nm) fornucleic acid scaffolds of 2,000 nt, produced using a 10% concentrationof Cy5-modified dCTP in aPCR.

FIGS. 12A-12E are schematic representations of each of 5 depictionsshowing how the edge length of two adjacent edges at the i-th vertex aremodified when the angle between two edges is relatively larger thanothers. FIG. 12A depicts a 4-arm junction, having each of four edgesdenoted by from ‘a’ to ‘d’ are connected at the vertex, i. FIG. 12Bdepicts the minimum angle, θ_(min) ^(i) at the i-th vertex and theinitial off-set distance (apothem), r_(i) calculated by θ^(i) _(min) andthe number of arms in i-th vertex. FIG. 12C depicts two cylinders (incase of the DX tile design) drawn on each edge with initial off-setdistance, r_(i). FIG. 12D depicts two cylinders located in adjacentedges that do not contact each other. The new off-set distance is d^(i)_(a,b), in which the subscript a,d represents the edge identifier. Thenew off-set distance, d^(i) _(a,b) can be solved with two givendistance, m^(i) _(a,b) and n^(i) _(a,b) and the given angle θ^(i)_(a,b). FIG. 12E depicts the size-modified edges incorporating theoff-set distance calculated in FIG. 12D.

FIG. 13 is a chart representing a collection of face-shaded models ofgeometric input into formula, and the corresponding molecular models ofnucleic acid nanostructures having either 2 by 2 square cross-sectionaledges, honeycomb (non-beveled) cross-sectional edges, or honeycomb(beveled) cross-sectional edges for each of two Platonic (1, 5), twoArchimedean (8, 15), two Johnson (17, 18), and two Catalan (26, 35)polyhedra, respectively, generated using top-down design procedures.Objects are not drawn to scale. Ribbon and space-filling cartoons areshown for each model.

FIGS. 14A-14I are schematic representations outlining each of the ninesteps for top-down sequence design procedure for scaffolded DNA origaminanoparticles composed of an exemplary six-helix bundle polyhedron withhoneycomb (non-beveled) cross-sectional morphology. Specification of thearbitrary target geometry (pyramid) is based on a continuous and closedsurface that is discretized using a polyhedral mesh (FIG. 14A). Thisdiscretized geometry is modified to represent each of the six-helices ateach edge as lines (FIG. 14B). The endpoints of the lines defined inFIG. 14B are joined such that every duplex become part of a loop (□)that has possible scaffold crossovers (∥) between these closed loops(FIG. 14C). The dual graph of the loop-crossover structures isintroduced by converting each loop (□) to a node (●), and each doublecrossover (∥) to an edge (▬), respectively (FIG. 14D). The spanning treeof the dual graph is computed (FIG. 14E). The route of thesingle-stranded DNA scaffold throughout the entire origami object isproduced by inverting the spanning tree of the dual graph (FIG. 14F).Complementary staple strands are assigned based on the scaffold route(FIG. 14G). A 3D cylindrical model (FIG. 14H) and atomic-level structure(FIG. 14I) are generated assuming canonical B-form DNA geometry.

FIGS. 15A-15I are schematic representations outlining each of the ninesteps for top-down sequence design procedure for scaffolded DNA ofarbitrary target geometry based on an open surface that is discretizedusing a polygon mesh. An exemplary structure is shown in FIG. 15A. Thisdiscretized geometry is modified to represent two duplexes as two lines(FIG. 15B). The edge length is increased to be closed at the boundary(FIG. 15C). The endpoints of the lines defined in FIG. 15C are joinedsuch that every duplex become part of a loop (□) that has possiblescaffold crossovers (∥) between these closed loops (FIG. 15D). The dualgraph of the loop-crossover structures is introduced by converting eachloop (□) to a node (●), and each double crossover (∥) to an edge (▬),respectively (FIG. 15E). The spanning tree of the dual graph is computed(FIG. 15F). The route of the single-stranded DNA scaffold throughout theentire origami object is produced by inverting the spanning tree of thedual graph (FIG. 15G). Complementary staple strands are assigned basedon the scaffold route (FIG. 15H). A 3D cylindrical model (FIG. 15I) andatomic-level structure (FIG. 15J) are generated assuming canonicalB-form DNA geometry.

FIGS. 16A-16C are atomic-level structural models of polyhedral nucleicacid nanostructures demonstrating schemes for using nucleic acidnanostructures for the capture of other molecules. FIG. 16A shows anopen wireframe tetrahedron nanostructure (56) coupled to an mRNA (58)using single strand DNA overhangs extended from the staples at nickpositions (60), with the sequence of the overhang complementary topredicted loops in the RNA structure. FIG. 16B depicts CRISPR enzymeCpf1 (62) with crRNA (64) can be captured onto a DNA nanoparticles (56)on a crossbeam built into the nanoparticle (66), which contains asequence targeted by the Cpf1/crisprRNA enzyme. FIG. 16C depicts CRISPRenzyme Cpf1 (62) with crRNA (64) can be captured onto a DNAnanoparticles (56) on an overhang sequences built into the nanoparticle(68), which contains a sequence complementary to a 3′ extension of thecrRNA.

FIGS. 17A-17E are schematic representations outlining each of the stepsfor top-down sequence design procedure for scaffolded DNA nanoparticlesincluding parallel crossover (PX) motifs. FIG. 17A depicts the spanningtree obtained for the geometry of the desired polyhedral shape. FIG. 17Bis a chart depicting edges classified into one of four categories, basedon the crossover motif (PX vs. DX) and its membership in the spanningtree. Edge fragments are superimposed and connected according to theclassification and the route of the scaffold through the structure. FIG.17C depicts the atomic structure predicted in the case where only PXmotifs are used, and no staples are present. FIG. 17D depicts the atomicstructure predicted in the case where both PX and DX motifs are used,and staples are also added with Watson-Crick base pairing to thescaffold. FIG. 17E depicts the atomic structure model of a pure PXorigami structure of another geometry, which has no staples; the objectis a single nucleic acid strand routed through every duplex andhybridized to itself.

FIG. 18 is a schematic illustration showing diversity of nanostructurelibrary design. Diversity is introduced at any level from objectgeometry, edge length between vertices, staple nick position ororientation along each edge, bait sequence orientation on the object,and the set of bait sequences that are used to capture the RNAs, whichmay be either guide RNAs, messenger RNAs, or alternatively genesemployed for gene therapies.

DETAILED DESCRIPTION OF THE INVENTION I. Definitions

The term “nucleotide” refers to a molecule that contains a base moiety,a sugar moiety and a phosphate moiety. Nucleotides can be linkedtogether through their phosphate moieties and sugar moieties creating aninter-nucleoside linkage. The base moiety of a nucleotide can beadenin-9-yl (A), cytosin-1-yl (C), guanin-9-yl (G), uracil-1-yl (U), andthymin-1-yl (T). The sugar moiety of a nucleotide is a ribose or adeoxyribose. The phosphate moiety of a nucleotide is pentavalentphosphate. A non-limiting example of a nucleotide would be 3′-AMP(3′-adenosine monophosphate) or 5′-GMP (5′-guanosine monophosphate).There are many varieties of these types of molecules available in theart and available herein.

The terms “oligonucleotide” or a “polynucleotide” are synthetic orisolated nucleic acid polymers including a plurality of nucleotidesubunits.

The terms “nucleic acid,” “polynucleotide,” and “oligonucleotide” areinterchangeable and refer to a deoxyribonucleotide or ribonucleotidepolymer, in linear or circular conformation, and in either single- ordouble-stranded form. For the purposes of the present disclosure, theseterms are not to be construed as limiting with respect to the length ofa polymer. The terms can encompass known analogues of naturalnucleotides, as well as nucleotides that are modified in the base, sugarand/or phosphate moieties (e.g., phosphorothioate backbones, lockednucleic acid). In general and unless otherwise specified, an analogue ofa particular nucleotide has the same base-pairing specificity; i.e., ananalogue of A will base-pair with T. When double-stranded DNA isdescribed, the DNA can be described according to the conformationadopted by the helical DNA, as either A-DNA, B-DNA, or Z-DNA. The B-DNAdescribed by James Watson and Francis Crick is believed to predominatein cells, and extends about 34 Å per 10 bp of sequence; A-DNA extendsabout 23 Å per 10 bp of sequence, and Z-DNA extends about 38 Å per 10 bpof sequence.

In some cases nucleotide sequences are provided using characterrepresentations recommended by the International Union of Pure andApplied Chemistry (IUPAC) or a subset thereof. IUPAC nucleotide codesused herein include, A=Adenine, C=Cytosine, G=Guanine, T=Thymine,U=Uracil, R=A or G, Y=C or T, S=G or C, W=A or T, K=G or T, M=A or C,B=C or G or T, D=A or G or T, H=A or C or T, V=A or C or G, N=any base,“.” or “-”=gap. In some embodiments the set of characters is (A, C, G,T, U) for adenosine, cytidine, guanosine, thymidine, and uridinerespectively. In some embodiments the set of characters is (A, C, G, T,U, I, X, Ψ) for adenosine, cytidine, guanosine, thymidine, uridine,inosine, uridine, xanthosine, pseudouridine respectively. In someembodiments the set of characters is (A, C, G, T, U, I, X, Ψ, R, Y, N)for adenosine, cytidine, guanosine, thymidine, uridine, inosine,uridine, xanthosine, pseudouridine, unspecified purine, unspecifiedpyrimidine, and unspecified nucleotide respectively. The modifiedsequences, non-natural sequences, or sequences with modified binding,may be in the genomic, the guide or the tracr sequences.

The terms “polypeptide,” “peptide” and “protein” are usedinterchangeably to refer to a polymer of amino acid residues. The termalso applies to amino acid polymers in which one or more amino acids arechemical analogues or modified derivatives of correspondingnaturally-occurring amino acids.

The terms “cleavage” or “cleaving” of nucleic acids, refer to thebreakage of the covalent backbone of a nucleic acid molecule. Cleavagecan be initiated by a variety of methods including, but not limited to,enzymatic or chemical hydrolysis of a phosphodiester bond. Bothsingle-stranded cleavage and double-stranded cleavage are possible, anddouble-stranded cleavage can occur as a result of two distinctsingle-stranded cleavage events. DNA cleavage can result in theproduction of either blunt ends or staggered “sticky” ends. In certainembodiments cleavage refers to the double-stranded cleavage betweennucleic acids within a double-stranded DNA or RNA chain.

Nucleotide and/or amino acid sequence identity percent (%) is understoodas the percentage of nucleotide or amino acid residues that areidentical with nucleotide or amino acid residues in a candidate sequencein comparison to a reference sequence when the two sequences arealigned. To determine percent identity, sequences are aligned and ifnecessary, gaps are introduced to achieve the maximum percent sequenceidentity. Sequence alignment procedures to determine percent identityare well known to those of skill in the art. Often publicly availablecomputer software such as BLAST, BLAST2, ALIGN2 or MEGALIGN (DNASTAR)software is used to align sequences. Those skilled in the art candetermine appropriate parameters for measuring alignment, including anyformulas needed to achieve maximal alignment over the full-length of thesequences being compared. When sequences are aligned, the percentsequence identity of a given sequence A to, with, or against a givensequence B (which can alternatively be phrased as a given sequence Athat has or comprises a certain percent sequence identity to, with, oragainst a given sequence B) can be calculated as: percent sequenceidentity=X/Y100, where X is the number of residues scored as identicalmatches by the sequence alignment program's or formula's alignment of Aand B and Y is the total number of residues in B. If the length ofsequence A is not equal to the length of sequence B, the percentsequence identity of A to B will not equal the percent sequence identityof B to A. Mismatches can be similarly defined as differences betweenthe natural binding partners of nucleotides. The number, position andtype of mismatches can be calculated and used for identification orranking purposes.

The term “endonuclease”, refers to any wild-type or variant enzymecapable of catalyzing the hydrolysis (cleavage) of bonds between nucleicacids within a DNA or RNA molecule, preferably a DNA molecule.Non-limiting examples of endonucleases include type II restrictionendonucleases such as FokI, HhaI, HindIII, NotI, BbvCl, EcoRI, BglII,and AlwI. Endonucleases comprise also rare-cutting endonucleases whenhaving typically a polynucleotide recognition site of about 12-45basepairs (bp) in length, more preferably of 14-45 bp. Rare-cuttingendonucleases induce DNA double-strand breaks (DSBs) at a defined locus.Rare-cutting endonucleases can for example be a homing endonuclease, amega-nuclease, a chimeric Zinc-Finger nuclease (ZFN) or TAL effectornuclease (TALEN) resulting from the fusion of engineered zinc-fingerdomains or TAL effector domain, respectively, with the catalytic domainof a restriction enzyme such as FokI, other nuclease or a chemicalendonuclease including CRISPR/Cas9 or other variant and guide RNA.

The term “exonuclease”, refers to any wild type or variant enzymecapable of removing nucleic acids from the terminus of a DNA or RNAmolecule, preferably a DNA molecule. Non-limiting examples ofexonucleases include exonuclease I, exonuclease II, exonuclease III,exonuclease IV, exonuclease V, exonuclease VI, exonuclease VII,exonuclease VII, Xm1, and Rat1.

In some cases an enzyme is capable of functioning both as anendonuclease and as an exonuclease. The term nuclease generallyencompasses both endonucleases and exonucleases, however in someembodiments the terms “nuclease” and “endonuclease” are usedinterchangeably herein to refer to endonucleases, i.e., to refer toenzyme that catalyze bond cleavage within a DNA or RNA molecule.

As used herein, the term “ligating” refers to enzymatic reactions inwhich two double-stranded DNA molecules are covalently joined, forexample, catalyzed by a ligase enzyme.

As used herein, the terms “aligning” and “alignment” refer to thecomparison of two or more nucleotide sequence based on the presence ofshort or long stretches of identical or similar nucleotides. Severalmethods for alignment of nucleotide sequences are known in the art, aswill be further explained below.

The terms “scaffolded DNA origami”, “DNA origami” or “DNA nanostructure”are used interchangeably. They can refer to methods of using numerousshort single strands of nucleic acids (staple strands) (e.g., DNA) todirect the folding of a long, single strand of polynucleotide (scaffoldstrand) into desired shapes on the order of about 10-nm to a micron ormore, and the structures form therefrom.

The term polyhedron refers to a three-dimensional solid figure in whicheach side is a flat surface. These flat surfaces are polygons and arejoined at their edges.

The terms “staple strands” or “helper strands” are used interchangeably.“Staple strands” or “helper strands” refer to oligonucleotides that holdthe scaffold DNA in its three-dimensional wireframe geometry. Additionalnucleotides can be added to the staple strand at either 5′ end or 3′end, and those are referred to as “staple overhangs”. Staple overhangscan be functionalized to have desired properties such as a specificsequence to hybridize to a target nucleic acid sequence, or a targetingelement. In some instances, the staple overhang is biotinylated forcapturing the DNA nanostructure on a streptavidin-coated bead. In someinstances, the staple overhang can be also modified with chemicalmoieties. Non-limiting examples include Click-chemistry groups (e.g.,azide group, alkyne group, DIBO/DBCO), amine groups, and Thiol groups.In some instances some bases located inside the oligonucleotide can bemodified using base analogs (e.g., 2-Aminopurine, Locked nucleic acids,such as those modified with an extra bridge connecting the 2′ oxygen and4′ carbon) to serve as linker to attach functional moieties (e.g.,lipids, proteins). Alternatively DNA-binding proteins or guide RNAs canbe used to attach secondary molecules to the DNA scaffold.

The term “geometry” or “geometric parameters” refer to the angles and/orrelative distances that describe any two connected edges of a shape,such as those that define the relative position of faces, and theproperties of the vertices and edges that form the three-dimensionalsolid.

The term “arbitrary geometry” refers to a non-specific three dimensionalshape, for example, any desired three-dimensional closed surface thatcan be rendered as a polyhedral wire mesh.

The term “network” is a representation of the lines and vertices thatdefine the relations between the line and vertices within the objects.In some embodiments, vertices are represented as nodes and lines arerepresented as edges in a graph. The degree (or valency) of a vertex ofa graph is the number of edges incident to the vertex.

The term “spanning tree” refers to a subset of edges and all of thenodes in a graph, such as the graphical node-edge network correspondingto the lines and vertices of a polyhedral shape. Typically, spanningtrees include all the vertices. Different spanning trees for a givennetwork can cover different edges. A breadth-first spanning treeincludes the maximum number of branches.

The term “Euler Path”, “Eulerian Trail”, or “Eulerian Path” refer to atrail in a graph which visits every edge exactly once. The terms “Eulercircuit”, “Euler Cycle”, “Eulerian Cycle” or “Eulerian Circuit” are usedinterchangeably and refer to a trail in a graph which visits every edgeexactly once, and which starts and ends on the same vertex. For theexistence of Eulerian cycles it is necessary that every vertex has evendegree, and all of its vertices with nonzero degree belong to a singleconnected component.

The term “loop-crossover structure” refers to 3D structure in whichendpoints are joined such that every duplex becomes part of a loop, andpositions of possible scaffold double crossovers are found between twoloops.

The term “dual graph” refers to the graph by converting each loop to anode and each double crossover to edge.

The terms “DX crossover”, or “antiparallel crossover”, or “DX motif” areused interchangeably, and refer to an antiparallel double crossovernucleic acid motif consisting of two four-arm Holliday junctions, joinedby two helical arms at two adjacent arms. The antiparallel orientationof the nucleic acid helical domains in antiparallel DX motifs impliesthat the major grooves of one nucleic acid helix faces the minor grooveof the other engaged helices come together in each turn.

The term “PX crossover”, or “parallel crossover”, or “PX motif” are usedinterchangeably to refer to a four-stranded DNA motif wherein twoparallel double helices are joined by reciprocal exchange (crossingover) of strands of the same polarity at every point where the strandscome together (see Seeman, Nano Letters 1, (1), pp. 22-26 (2001); Wang,PNAS, 9. 107 (28), pp. 12547-12552 (2010)). No strand breakage andrejoining is needed, because two double helices can form PX-DNA merelyby inter-wrapping. The reciprocal exchange between two double strandednucleic acid helices can occur between two helices having either thesame or opposite stand polarity. An exemplary PX motif is the “paranemiccrossover”. PX motifs are usually followed by a pair of numbers, e.g.PX65 motif, that describe the number of base pairs in the major grooveand minor groove of the double helices, respectively, between parallelcrossovers. The number of base pairs in the major groove is typicallygreater than that in the minor groove. Exemplary, PX motifs includePX65, PX75, PX85, PX95, PX64, PX74, and PX66 (Maiti, et al., BiophysicalJournal. 90, 1463-1479 (2006); Shen, et al., J. Am. Chem. Soc. 126,1666-1674 (2004)).

The term “bait sequence” refers to a single-stranded nucleic acidsequence that is complementary to any fragment of a target nucleic acidsequence, such as an RNA, for capturing the target nucleic acids.Typically, bait sequences are appended to or otherwise are present aspart of the staple sequence of a nucleic acid nanostructure, forexample, as an “overhang” sequence of nucleic acids. In someembodiments, bait sequences are complementary to loop regions orsingle-stranded regions of target RNAs for capturing the RNAs.Alternatively, the bait sequences tether proteins or other ligands thattarget binding regions to capture a structured RNA or DNA assembly ofinterest via avidity enhancement.

The term “nucleic acid capture” refers to binding of any nucleic acidmolecule of interest having complementary nucleic acid sequences to thebait sequences on the DNA nanostructures, or having affinity for thecapture bait probe employed, and being immobilized or attached to theDNA nanostructures via hybridization to the bait sequence, or binding.For example, “RNA capture” refers to binding of any ribonucleic acidmolecule of interest to the bait sequences on the DNA nanostructures.Nucleic acids of interest can bind to the inside or outer surface of anucleic acid nanostructure.

The phrase that a molecule “specifically binds” to a target refers to abinding reaction which is determinative of the presence of the moleculein the presence of a heterogeneous population of other biologics. Thus,under designated immunoassay conditions, a specified molecule bindspreferentially to a particular target and does not bind in a significantamount to other biologics present in the sample. Specific binding of anantibody to a target under such conditions requires the antibody beselected for its specificity to the target. A variety of immunoassayformats may be used to select antibodies specifically immunoreactivewith a particular protein. For example, solid-phase ELISA immunoassaysare routinely used to select monoclonal antibodies specificallyimmunoreactive with a protein. See, e.g., Harlow and Lane (1988)Antibodies, A Laboratory Manual, Cold Spring Harbor Publications, NewYork, for a description of immunoassay formats and conditions that canbe used to determine specific immunoreactivity. Specific binding betweentwo entities means an affinity of at least 10⁶, 10⁷, 10⁸, 10⁹, or 10¹⁰M-1. Affinities greater than 10⁸ M-1 are preferred.

The term “targeting molecule” refers to a substance which can direct ananoparticle to a receptor site on a selected cell or tissue type, canserve as an attachment molecule, or serve to couple or attach anothermolecule. The term “direct” refers to causing a molecule topreferentially attach to a selected cell or tissue type. This can beused to direct cellular materials, molecules, or drugs, as discussedbelow.

The terms “antibody” or “immunoglobulin” are used to include intactantibodies and binding fragments thereof. Typically, fragments competewith the intact antibody from which they were derived for specificbinding to an antigen fragment including separate heavy chains, lightchains Fab, Fab′ F(ab′)2, Fabc, and Fv. Fragments are produced byrecombinant DNA techniques, or by enzymatic or chemical separation ofintact immunoglobulins. The term “antibody” also includes one or moreimmunoglobulin chains that are chemically conjugated to, or expressedas, fusion proteins with other proteins. The term “antibody” alsoincludes a bispecific antibody. A bispecific or bifunctional antibody isan artificial hybrid antibody having two different heavy/light chainpairs and two different binding sites. Bispecific antibodies can beproduced by a variety of methods including fusion of hybridomas orlinking of Fab′ fragments. See, e.g., Songsivilai and Lachmann, Clin.Exp. Immunol., 79:315-321 (1990); Kostelny, et al., J. Immunol., 148,1547-1553 (1992).

The terms “epitope” or “antigenic determinant” refer to a site on anantigen to which B and/or T cells respond. B-cell epitopes can be formedboth from contiguous amino acids or noncontiguous amino acids juxtaposedby tertiary folding of a protein. Epitopes formed from contiguous aminoacids are typically retained on exposure to denaturing solvents whereasepitopes formed by tertiary folding are typically lost on treatment withdenaturing solvents. An epitope typically includes at least 3, and moreusually, at least 5 or 8-10, amino acids, in a unique spatialconformation. Methods of determining spatial conformation of epitopesinclude, for example, x-ray crystallography and 2-dimensional nuclearmagnetic resonance.

The term “small molecule,” as used herein, generally refers to anorganic molecule that is less than about 2,000 g/mol in molecularweight, less than about 1,500 g/mol, less than about 1,000 g/mol, lessthan about 800 g/mol, or less than about 500 g/mol. Small molecules arenon-polymeric and/or non-oligomeric.

II. Methods for Design of Nucleic Acid Nanostructures

Systems and methods for the automated, step-wise design of a nucleicacid nanostructure having arbitrary geometries have been established.The systems and methods generally involve rendering the geometricparameters of a desired polyhedral form as a node-edge network, anddetermining the nucleic acid scaffold route and staple design parametersnecessary to form the desired polyhedral structure. Therefore, methodsfor generating the sequences of the single-stranded nucleic acidscaffold and the nucleic acid sequence of staple strands that combine toform a nucleic acid nanostructure having the desired shape are provided.An exemplary method for designing a nucleic acid nanostructure having adesired polyhedral form includes selecting a desired 3D polyhedral or 2Dpolygon form as a target structure; providing geometric parameters andphysical dimensions of the a target structure for a selected 3Dpolyhedral or 2D polygon form; identifying the route of asingle-stranded nucleic acid scaffold that traces throughout the entiretarget structure; and generating the sequences of the single-strandednucleic acid scaffold and/or the nucleic acid sequence of staple strandsthat combine to form a nucleic acid nanostructure having the desiredshape. DNA nanostructures having the desired shape are produced byfolding a long single stranded polynucleotide, referred to as a“scaffold strand”, into a desired shape or structure using a number ofsmall “staple strands” as glue to hold the scaffold in place. Typically,the number of staple strands will depend upon the size of the scaffoldstrand, the complexity of the shape or structure, the types of crossovermotifs employed, and the number of helices per edge. For example, forrelatively short scaffold strands (e.g., about 150 to 1,500 base inlength) and/or simple structures the number of staple strands are small(e.g., about 5, 10, 50 or more). For longer scaffold strands (e.g.,greater than 1,500 bases) and/or more complex structures, the number ofstaple strands can be several hundreds to thousands (e.g., 50, 100, 300,600, 1,000 or more helper strands). Using parallel crossover motifs,however, the number of staples can be reduced, even to zero. The choiceof staple strands and, in some instances, the programmedself-hybridization of the scaffold strand, determine the pattern. Insome embodiments, a software program is used to identify the staplestrands needed to form a given design.

Typically, the methods include one or more of the following steps:

(a) Selecting a target polyhedral structure;

(b) Choosing the cross-section geometry of the edge of 2 helices, 4helices on a square or honeycomb lattice, 6 helices on a square orhoneycomb lattice, or any even number of helices on a square orhoneycomb lattice.

(c) Determining the spatial coordinates of all vertices, the edgeconnectivity between vertices, and the faces to which vertices belong inthe target structure;

(d) Identifying the route of a single-stranded nucleic acid scaffoldsequence that traces throughout the entire target polyhedral structure,and

(e) Determining the nucleic acid sequence of the single-stranded nucleicacid scaffold and, optionally, the nucleic acid sequence ofcorresponding staple strands.

Typically, the route of the scaffold nucleic acid is identified by

(i) Determining edges that form the spanning tree of the node-edgenetwork (for example, using the Prim's Formula);

(ii) Bisecting each edge that does not form the spanning tree to formtwo split edges;

(iii) Determining an Eulerian circuit that passes twice along each edgeof the spanning tree. The direction of the continuous scaffold sequenceis reversed at the bisecting point of the node-edge network in aDX-anti-parallel crossover, and the Eulerian circuit defines the routeof a single-stranded nucleic acid scaffold sequence that passesthroughout the entire structure. Staple strands are located at thevertices and edges of the route of the single-stranded nucleic acidscaffold sequence determined in (d).

Typically, for the origami nanostructures that incorporate parallelcrossovers, the route of the scaffold nucleic acid is identified bydetermining an Eulerian circuit that passes twice or more than twicealong each edge of the wireframe. Based on the length and spanning treeclassification, units of partial scaffold routing are superimposed andconnected to complete the circuit.

In some embodiments the methods further include the steps by

(i) Detaching and scaling each edge of the initial geometry to representthe number of helixes as lines indicating their lengths and endpoints;

(ii) Generating the loop-crossover structure joining endpoints andfinding double crossovers between two loops;

(iii) Generating the dual graph of the loop-crossover by converting eachloop to a node and each double crossover to edge;

(vi) Computing the spanning tree of the dual graph of the loop-structure(for example, using the Prim's Formula);

(v) Inverting the dual graph back to the loop-crossover structure butwithout the double-crossovers corresponding to the non-member's spanningtree; Edges that are members of the spanning tree correspond to thesubset of crossovers required to complete the Eulerian circuit.

In some embodiments the methods further include the step of

(f) Modelling the 3-Dimensional co-ordinates of each nucleic acidaccording to the parameters determined in (c) and (d).

In further embodiments the methods further include the step of

(g) Assembling and optionally purifying the nucleic acid nanostructuresdesigned by the methods of any of the steps (a) through (d).

Each of these steps is discussed in more detail, below.

The method described herein is a “top-down approach” of the structure(i.e., only input is a “shape” and the number and geometry of helicesper edge). Nothing else is required, except for optional selection of asize and an input sequence (otherwise, default parameters can be usedfor both).

Default parameters for input scaffold size, nucleic acid type, inputscaffold sequence, edge length, number of helices per edge,cross-sectional morphology of edges and vertex geometry (i.e., beveledor non-beveled edges) can be used as necessary to generate the sequencesof staples and/or scaffold nucleic acid when no value is specified. Forexample, in some embodiments, the default nucleic acid is B-DNA, and thedefault edge-length is 31 bases, with 2 helices per edge. In someembodiments, the default nucleic acid scaffold sequence is the 7,249 ntM13pm18 bacteriophage DNA. In some embodiments, when the number ofhelices per edge is specified, but the vertex morphology is notspecified, the default vertex geometry is to use honeycomb morphologywith beveled edges.

This is fundamentally different from bottom-up design methods. Forexample, the “bottom-up approach” does not produce the sequences ofstaple strands, but requires manual intervention via an heuristicapproach, using multiple duplex arms combined together to form thestructure (i.e., may not use a single scaffold sequence throughout). Thetop-down methods start with the desired output, i.e. the final structureand the use of a specific scaffold, and generate the sequences requiredto synthesize that output, using a single ssDNA scaffold that is routedthroughout the entire structure. The scaffold can be a user-definedscaffold sequence, and the staple sequences are varied accordingly.

The approach is extremely powerful because it can exploit the singlescaffold strand to enable down-stream applications, such as DNA RAMstorage (i.e., a single strand of DNA is folded into each object), aswell as other applications.

The formula uses a maximum-breadth spanning tree to determine positionsof the scaffold crossovers for the scaffold routing. Any spanning tree,however, will lead to a valid scaffold routing. The nanostructuresthemselves are distinct in having a continuous single stranded nucleicacid sequence routed through each edge of the structure.

A. Providing a Target Structure

Methods for the step-wise design of a nucleic acid nanostructure, basedon the arbitrary wireframe geometry of the desired (target) structure asa starting model have been developed. The methods are useful to providedesign parameters, such as the sequence of a single-stranded DNA nucleicacid “scaffold”, as well as corresponding single-stranded edge andvertices “staple” sequences necessary to form a nucleic acidnanostructure having the shape of the input structure.

1. Selection of Target Structure

The methods require geometric parameters that define the target shape asinput. Therefore, the starting point for the design process is theselection of a target shape. Any arbitrary geometric shape that can berendered as a “wireframe” model can be selected as input for the designof nucleic acid assemblies.

Exemplary target shapes include three-dimensional structures, including,but not limited to, Platonic polyhedrons, Archimedean polyhedrons,Johnson polyhedrons, Catalan solids, or asymmetric three-dimensionalstructures. In some embodiments, the target structure has a programmedgeometry that is topologically equivalent to that of a sphere. In otherembodiments, the target structure has a programmed geometry that istopologically distinct to that of a sphere. For example, targetstructures including nested structures, and toroidal structures can bedesigned using the described methods. In other embodiments, the targetstructure has a programmed geometry that is topologically equivalent toa plane. For example, target structures including triangular mesh,square mesh, or other mesh.

Target structures can be selected based upon one or more designcriteria, or can be selected randomly. In some embodiments, structuresare selected based on existing ‘natural’ 3-dimensional organizations(e.g., virus capsids, antigens, toxins, etc.). Therefore, in someembodiments, target shapes are designed for use directly or as part of asystem to mediate a biological or other responses which are dependentupon, or otherwise influenced by 3D geometric spatial properties. Forexample, in some embodiments, all or part of a structure is designed toinclude architectural features known to elicit or control one or morebiological functions. In some embodiments, structures are designed tofulfill the 3D geometric spatial requirements to induce, prevent,stimulate, activate, reduce or otherwise control one or more biologicalfunctions. Typically, the desired shape defines a specific geometricform that will constrain the other physical parameters, such as theabsolute size of the particle. For example, the minimum size of nucleicacid nanostructures designed according to the described methods willdepend upon the degree of complexity of the desired shape.

i. 2-Dimensional Wireframe Structures

Target structures can be any solid in two dimensions. Therefore, targetstructures can be a grid or mesh or wireframe topologically similar to a2D surface or plane. The grid or mesh can be composed of regular orirregular geometries that can be tessellated over a surface.

Exemplary target structures include triangular lattices, squarelattices, pentagonal lattices, or lattices of more than 5 sides. 2Dstructures can be designed to have varied length and thickness in eachdimension. In some embodiments, the edges of 2D nanostructures include asingle nucleic acid helix. In other embodiments, the edges of 2Dnanostructures include two or more nucleic acid helices. For example, insome embodiments, each edge of the 2D nanostructure includes 2 helices,4 helices, 6 helices or 8 helices, or more than 8 helices, up to 100helices per edge, although theoretically unlimited in number.

ii. Polyhedral Structures

Target structures can be any solid in three dimensions that can berendered with flat polygonal faces, straight edges and sharp corners orvertices.

Exemplary basic target structures include cuboidal structures,icosahedral structures, tetrahedral structures, cuboctahedralstructures, octahedral structures, and hexahedral structures. In someembodiments, the target structure is a convex polyhedron, or a concavepolyhedron. For example, in some embodiments, the methods design nucleicacid assemblies of a uniform polyhedron that has regular polygons asfaces and is isogonal. In other embodiments, the methods design nucleicacid assemblies of an irregular polyhedron that has unequal polygons asfaces. In further embodiments, the target structure is a truncatedpolyhedral structure, such as truncated cuboctahedron.

Platonic polyhedrons include polyhedrons with multiple faces, forexample, 4 faces (tetrahedron, (1)), 6 faces (cube or hexahedron (2), 8faces (octahedron), 12 faces (dodecahedron), and 20 faces (icosahedron).

In some embodiments, the target structure is a nucleic acid assemblythat has a non-spherical geometry. Therefore, in some embodiments, thetarget structure has a geometry with “holes”. Exemplary non-sphericalgeometries include toroidal polyhedra and nested shapes. Exemplarytoroidal polyhedra include a torus and double torus. Exemplarytopologies of nested shapes include nested cube and nested octahedron.Exemplary polyhedral forms are depicted in FIGS. 2A-2B.

In other embodiments, target structures can be a combination of one ormore of the same or different polyhedral forms, linked by a commoncontiguous edge.

iii. Reinforced Polyhedral Structures

In some embodiments, the target structure is a reinforced structure.Reinforced structures are structures that share the same polyhedral formas the equivalent, non-reinforced structure, and include one or moreadditional edges spanning between two vertices. Typically, thereinforced structure contains at least one or more edges than thecorresponding non-reinforced structure. In some embodiments, additionalstructural elements that appear as “cross-bars” spanning between twovertices are introduced.

In some embodiments, a structure is reinforced by the addition of one ormore edges passing internally within the space enclosed by thestructure. Therefore, in some embodiments reinforced structuresencapsulate a smaller volume than the corresponding non-reinforcedstructure. In other embodiments, a structure is reinforced by theaddition of one or more edges that connects vertices by spanning a faceof the polyhedron. In further embodiments, a polyhedral nanostructure isreinforced by including one or more additional edges that connectvertices by spanning a face of the polyhedron and one or more additionaledges that connect vertices by passing internally within the spaceenclosed by the structure. In some embodiments, a polyhedralnanostructure is reinforced by addition of one or more edges thatbisects a face of the polyhedron and addition of one or more vertices.

iv. Other Structures

In some embodiments the desired structure has a shape that is visuallyor geometrically similar to a biological structure, such as the shape ofa viral particle, or a sub-component of a viral particle; a protein; ora sub-component of a protein.

2. Providing Geometric Parameters of the Target Structure

The methods can include the step of providing the geometric parametersthat define the target structure. Geometric parameters include thespatial coordinates of all vertices, the edge connectivity betweenvertices, and the faces to which vertices belong. Geometric parameterscan be determined using any means that represents the form of the targetstructure. Typically, geometric parameters are determined by renderingthe target structure as a wire-frame mesh. In some embodiments thedetermination of geometric parameters for an input shape is carried outusing a computer-based interface. Therefore, in some embodimentsgeometric parameters of a target shape are determined in silico.Typically, in silico determination of geometric parameters can requireinput of a target shape, or input of the rendered wire-frame model ofthe target shape. In some embodiments, the only input is a target shape,or input of the rendered wire-frame model of the target shape. Forexample, in some embodiments, following input of a target shape and/orgeometric parameters corresponding to a target shape, all other stepsare performed within an automated system, such as by a computer usingsoftware including each of the method steps, optionally incorporatingone or more default parameters. In some embodiments, the input is a2-dimensional shape, or geometric parameters of a 2-dimensional shape.In some embodiments, the target structure is the three-dimensional formcorresponding to the 2-dimensional shape. For example, a 3-dimensionalcuboidal structure can be inferred from input of the geometricparameters of one or more of the faces of the 3-dimensional structure.In an exemplary embodiment, a single square face is input and thecorresponding regular cube is provided as input in wire-frameconformation.

i. Wire-frame of Arbitrary Geometry

The methods can determine nucleic acid scaffold and staple sequences fornanostructures having the shape of any open or closed geometric surfacethat can be rendered as a polyhedral surface wire-frame model.

Therefore, the methods include reduction of the target structure as amodel that represents each edge of the physical object where two or morecontinuous smooth surfaces meet, or by connecting an object'sconstituent vertices using straight lines. Typically, a wireframe modelof a geometric shape represents the minimum number of characteristicedges and vertices that define the 2D or 3D shape.

Typically, when some or all of the methods are carried out using acomputer-based interface, the geometric parameters of a target shape areprovided in a standard polyhedral file format. The geometric parametersof any open or closed, orientable surface network can serve as inputusing any file format that specifies polygonal geometry known in theart, including but not limited to, Polygon File Format (PLY),Stereolithography (STL), or Virtual Reality Modeling Language (WRL).When a standard polyhedral file format is provided, the code includes aparser to convert the standard polyhedral files into the requiredinputs.

ii. Edge Geometry

In addition to the geometric parameters of a target shape, thecross-sectional shape on all edges is defined. Exemplary cross-sectionalforms include two double-stranded nucleic acid helices; a square lattice(minimum four double helices); and honeycomb lattice (minimum six doublehelices). Each double helical section has an identification number whichdetermines the orientation of the helix along the edge direction. Tomake antiparallel helixes given in the bundle of helixes, theidentification number should be even when the neighboring helix has theodd number.

In some embodiments, one or more edges of a shape is defined as having asquare cross-sectional lattice including an even number of doublehelices. For example, in some embodiments, each edge includes fourhelices, six helices or more than six double helices, for example, 36,or 64 double helices. The square lattice is composed such that each ofhelices are arranged with rectangular symmetry across the axis of theedge and such that any one helix can have crossovers along the edge withup to four other helices. In some embodiments, one or more edges of ashape is defined as having a honeycomb cross-sectional lattice includingsix double helices, eight double helices, ten double helices or morethan ten double helices, for example, 12, 24, or 48 double helicesarranged in a honeycomb pattern. The honeycomb lattice is composed suchthat each of the helices are arranged on a hexagon pattern across theaxis of the edge and such that each helix can have crossovers with up tothree other helices along the edge.

iii. Vertex Geometry

Typically, a vertex of “degree N” has “N” number of edges emerging fromit. For example, if a vertex is of degree 4, it is contacted by 4 edges.An Euler circuit through a node-edge network of a given shape isguaranteed when the degree of every vertex is even. Therefore, inpreferred embodiments, the degree of every vertex in the node-edgenetwork is even, such that the Eulerian Circuit of the graph passesthrough each of the edges once in each direction. By choosing to have aneven number of duplexes per edge in the wireframe, the vertices of thefinal DNA nanostructure are technically of even degrees, even if some orall of the vertices in the wireframe input are of odd degrees.

There are several conventions by which to define the inradius of avertex, which will vary according to the number of edges that combine atthe vertex “junction”, and whether the angles between each edge enteringor leaving the junction are equal or different relative to one another(see FIGS. 5A-5F). For example, the angles at the vertex determine thedistance between the vertex and the beginning of the edge. For verticeswith equal face angles (FIGS. 5A and 5D) the distance between the vertexand the beginning of the edge is the inradius of the regular polygondefined by the width of each edge in the junction. For vertices withunequal face angles, the backbone can be stretched (FIGS. 5B and 5E), orthe vertex can incorporate nucleotide overlap (FIGS. 5C and 5F).

For structures including more than 2 double-helices per edge, thecross-geometry of the nucleic acid scaffold and/or staple strands on allvertices is defined for the junctions between each edge, in relation tothe interface with the two or more further edges. For example, when anedge is defined as having a honeycomb cross-sectional form, the geometryof each honeycomb lattice edge can be defined as either having a beveledor non-beveled edge at the junction of multiple edges (vertex) (seeFIGS. 12A-12E and FIG. 13 ).

In one embodiment of a non-beveled type, between two neighboring edgesat a vertex exactly one helix of one edge is involved with exactly oneother helix of the other edge by both a scaffold crossover as well aspossibly a staple crossover, irrespective of edge lattice type. Allother helices on the edge are extended or truncated to the crossoverposition near to the vertex. Scaffold or staple strands may be unpairedat the vertex, or no unpaired scaffold or staples may be present.

In one embodiment of the beveled type, between two neighboring edges ata vertex one helix, two helices, three helices, or more than threehelices from one edge are connected with an equivalent number of heliceson the neighboring edge. Thus, for example, three helices on one edgeare connected to three helices on a neighboring edge. The edge length oftwo adjacent edges at the i-th vertex is modified when the angle betweentwo edges is relatively larger than others. At the 4-arm junction, eachof four edges denoted by from ‘a’ to ‘d’ is connected at the vertex, i.The minimum angle, θ^(i) _(min) at the i-th vertex is found, and theinitial off-set distance (apothem), r_(i) is calculated by θ^(i) _(min)and the number of arms in i-th vertex. Two cylinders (in case of the DXtile design) are drawn on each edge with initial off-set distance,r_(i). When two cylinders located in adjacent edges do not contact eachother (not close), the new off-set distance, d^(i) _(a,d), isdetermined, in which the subscript a,d represents the edge identifier.The new off-set distance, d^(i) _(a,d) can be solved with two givendistance m^(i) _(a,d), and n^(i) _(a,d) and the given angle θ^(i)_(a,d). The helix continues such that each helix will not stericallyclash with any other helix, and crossovers of scaffold and staples willoccur at closest contact between any two neighboring helices but ondifferent edges.

Thus, the geometry of a flat type will be connected to a neighboringedge by one scaffold and/or staple crossover at the vertex. The geometryof a beveled type will be connected to a neighboring edge by the numberof helices of the edge coming into the vertex divided by the number ofedges the incoming edge is a neighbor to. In spherical topologies thisis defined as the number of helices of the edge divided by 2. Thus, asan example, a beveled edge vertex with an edge composed of six helicestotal on a honeycomb lattice will share three scaffold and/or staplescrossed-over to three helices on a neighboring edge while the otherthree helices will crossover to helices on the other neighboring edge ofthe particle. Typically the choice of vertex geometry is chosen by thegenerator of the design prior to routing and the geometry, and placementof the crossover between the edges is automated based on extending orcontracting all other helices to make crossovers at geometric positionswithout inducing steric clashing.

3. Providing Physical Parameters of the Target Structure

In addition to the geometric form of the target structure, the methodsenable design of the physical parameters of the nucleic acidnanostructure. Physical parameters that can be specified by the userinclude size, molecular weight, core nucleic acid sequence, as well aspre-determination of stability. For example, the stability of thenucleic acid nanostructure in one or more solvents can be required. Inan exemplary embodiment, a structure that exhibits stability inphysiological salt concentrations is designed by the methods.

Therefore, the methods include design of customized nucleic acidnanostructures having a specified size, having a specified molecularweight, having a specified core nucleic acid sequence, and combinationsthereof.

i. Size

Methods for the step-wise design of custom nucleic acid nanostructurescan produce nanostructures of a desired size. Typically, the size of thenanostructures is specified as a function of the length of the edgesthat form the wire-frame model of the desired structure.

Typically, the desired length of each edge is specified. In preferredembodiments, the lengths of edges obey the natural geometry of DNA orRNA. Preferably, the specified length of each edge does not give rise toshape distortions that force deviation from the target geometry.Therefore, in preferred embodiments, the length of each edge isspecified as a number of base-pairs (bp) or nucleotides (nt) that isdetermined to ensure that no over- or under-wind in nucleic acidduplexes occurs. Typically, the length of each edge is a multiple of theunit number of base-pairs that is required to reduce or prevent over- orunder-wind in nucleic acid duplexes that form the edges of the desirednucleic acid nanostructure. In some embodiments, the unpairednucleotides in the scaffold are used to ensure no-over- or under-wind innucleic acid multiple duplexes occurs when the length of each edge isnot the multiple of 10.5 bp.

In some embodiments the length of each edge is a multiple of 10.5 bp. Insome embodiments, the length of each edge 10.5 bp is rounded up or downto the nearest nucleotide. In some embodiments the minimum length of anysingle edge is 31 bp. Any edge length smaller than 31 bp will create ascaffold crossover 5 nt away from the end of an edge (in the vertexstaple region) and will not yield a large quantity of final foldednanostructure product. Typically, constraining edge lengths to bemultiples of 10.5 bp does not limit or otherwise restrict the selectionof the target shape.

In some embodiments the length of the edge is a multiple of 11 bp. Insome embodiments the selection of edges having length of 33 bp, 44 bp,55 bp, 66 bp, 77 bp, 88 bp, or larger than 88 bp. The minimum edgelength allowed in this design paradigm is 33 bp.

In some embodiments, the desired structures have equal edge lengthsthroughout the geometry. For example, design of Platonic, Archimedean,or Johnson solids includes the selection of edges having a length of 31bp, 42 bp, 52 bp, 63 bp, 73 bp, 84 bp, 94 bp, or larger than 94 bp.

In some embodiments, desired structures do not have equal edge lengthsthroughout the geometry. Therefore, in some embodiments rounding of edgelengths is required. When rounding of edge lengths is required, themethod can design nanostructures including deviations between thespecified target structure and final design. For example, deviations inlengths of edges can occur at one or more edges in a structure. In thesecases, the desired minimum edge length (e.g., 31 bp, or 42 bp) isassigned to the shortest edge and the other edges are scaled and roundedappropriately. Therefore, in some embodiments, where deviations betweenthe specified target structure and final design are associated withdifferent edge-lengths, multiple nanostructures can be designed, havingdeviations at one or more different edges.

Typically, the rounding of edge lengths is carried out automatically,for example, by computer software. When using automated rounding togenerate edge lengths, the user is advised to verify that edge lengthsare satisfactory before proceeding to the scaffold routing procedure.

The dimensions of edges that are selected are associated with theoverall dimensions of the resulting nucleic acid nanostructure. The sizeof a nanostructure designed by the described methods can be defined asthe maximum length of the structure in a single plane. Typically, themethods can design structures having overall dimensions of approximately10-1,000 nm, inclusive, such as 50-500 nm, 60-200 nm, or 60-100 nm, forexample, 60 nm, 70 nm, 80 nm, 90 nm, 100 nm or larger than 100 nm.

The average minimum size of a nanostructure is typically restricted bythe complexity of the desired shape. Therefore, in some embodiments, thedesired size of the nanostructure is a characteristic that is used inthe automated design of target shapes that fulfill the desired maximumand/or minimum size criteria.

ii. Molecular Weight

The custom nucleic acid nanostructures produced according to thedisclosed methods have a molecular weight. Typically, the molecularweight of the nanostructures is a function of the mass of the nucleicacids forming each of the edges that form the wire-frame model of thedesired structure. Typically, the methods design nucleic acidnanostructures that have a molecular weight of between 200 kilo daltons(kDa) and 1 mega dalton (1 mDa).

The molecular weight of a nanostructure is typically defined by the sizeand complexity of the desired shape. Therefore, in some embodiments, thedesired molecular weight of the nanostructure is a characteristic thatis used in the automated design of target shapes that fulfill a desiredmaximum and/or minimum molecular weight criteria. Thus, the disclosedmethods for step-wise design of custom nucleic acid nanostructures canproduce nanostructures having a predetermined or preset molecularweight.

iii. Nucleic Acid Scaffold Template Sequence

In some embodiments, the methods design the sequence of staples thatgive rise to nanostructure having the desired shape based on acorresponding nucleic acid sequence. Therefore, in some embodiments theinput also includes providing one or more nucleic acid templatesequences.

The nucleic acid template sequence can include natural nucleic acids ornon-natural nucleic acids, or can include a combination of natural andnon-natural nucleic acids.

In other embodiments, no input sequence is provided. Therefore, in someembodiments one or more known nucleic acid sequences are used as adefault template sequence. In some embodiments the default templatesequence is a sequence or a subset of a sequence corresponding to abacteriophage. An exemplary default template sequence is a segment ofthe bacteriophage M13pm18. The M13mp18 single-stranded nucleic acidsequence is available at http://www.ncbi.nlm.nih.gov/nuccore/X02513 andis published as Genbank Accession Nos. X02513, M77815, and M11454 (FIG.8A).

In other embodiments, a sequence is randomly generated. In furtherembodiments, the template sequence for the single-stranded DNA scaffoldis determined based on the required scaffold length, for example, asdetermined by the Eulerian circuit corresponding to the desired shapeaccording to the described methods. If the desired sequence is longerthan the input sequence template, a sequence is randomly generated. Forexample, if the default template sequence is M13pm18, and the requiredsequence is longer than 7,249 nucleotides, a random single-strandedscaffold template sequence is generated, for example, by a computer.

Typically, the nucleic acid scaffold sequence is between 150 to 15,000bases in length.

When DNA is used to create dsDNA helices within a nanostructure, DNAdouble-stranded helices having a particular conformation can beemployed. For example, double-stranded DNA can be A-form DNA, B-form DNAor Z-form DNA.

B. Identifying Scaffold Routing for the Target Structure

The methods include identifying the route of the scaffold nucleic acidthroughout the target structure, based on the information provided inthe corresponding node-edge network of the corresponding polyhedron.Typically, the nodes and lines of the network correspond to the verticesand edges of the desired polyhedron. For example, Prim's formula can beused to find a breadth-first search spanning tree, one with the mostbranches. The spanning tree formula does not impose restrictions on thetopology of the network. Therefore, the methods provide routinginformation for any arrangement of nodes and edges using a spanning treeto define the placement of scaffold crossovers.

1. Nanostructures with Antiparallel Double Crossover (DX) Motifs

In some embodiments, the methods require including at least one edgehaving one “DX” (anti-parallel scaffold crossover) motif. The edges withzero DX scaffold crossovers meet the definition of a spanning tree of anetwork. Therefore, a single DX anti-parallel scaffold crossover ispositioned along every edge that does not form part of the spanning treeof the graph, preferably as close to the center of the edge as possible.

The scaffold strand is routed by a method that identifies the Euleriancircuit through the entire network, such that the strand enters eachvertex from a first edge and exits the vertex from an adjacent edge thatshares a face with the first edge. The route of the scaffold strand isdetermined according to the rules that the scaffold strand does notenter and exit from the same edge, and the scaffold strand does not exitfrom an edge that is not-adjacent to the edge it enters. Therefore, thescaffold routing process does not allow for the intersection of DNAstrands and the process produces only edges that are connected to thevertex.

Each of the steps involved in determining the route of thesingle-stranded nucleic acid scaffold is described in more detail,below.

a. Determination of the Node-Edge Network

In some embodiments, the wire-frame model of a desired polyhedralstructure is rendered as a node-edge network. Typically, the nodes andedges of the network correspond to the vertices and lines of thepolyhedron. In certain embodiments, a node-edge network corresponding toa structure can be represented by the planar graph of the correspondingpolyhedron, or by other means. For example, in some embodiments theplanar graph of the corresponding polyhedron is a Schlegel diagram. TheSchlegel diagram is a projection of the desired polyhedral form fromR^(d) into R^(d-1) through a point beyond one of its facets or faces.The resulting entity is a polytopal subdivision of the facet in R^(d-1)that is combinatorially equivalent to the original polyhedral form.Formulas and methods for generating a Schlegel diagram of a polyhedralform are known in the art. In other embodiments, a node-edge network iscalculated for a corresponding structure without the use of a planargraph.

Therefore, in some embodiments, the methods include the step ofproviding a node-edge network of the target structure. Typically, eachof the vertices corresponds to a node in the network, and each linebetween any two vertices represents an edge in the network.

b. Creating a Spanning Tree

In some embodiments, the node-edge network is used to establishconnectivity amongst all of the vertices. An exemplary representation ofconnectivity through the node-edge network is by producing one or morespanning trees. The spanning tree is the set of edges that connect allnodes within the network without circuits. In some embodiments, thespanning tree is determined using one or more formulas. Formulas fordetermining the spanning tree for a network are known in the art. Anexemplary method for determining the spanning tree for the node-edgenetwork corresponding to the desired shape is Prim's Formula. Therefore,in some embodiments, identifying scaffold routing includes creating oneor more spanning trees for the node-edge network. In certainembodiments, the spanning tree is the spanning tree produced using amaximum-breadth search. If, as in this case, all edges are weighted thesame, Prim's formula will generate a breadth-first search spanning tree,one with the most branches. Therefore, in some embodiments, identifyingscaffold routing includes the selection of one or more spanning treesthat have the most branches.

It has been shown that branching trees self-assemble more reliably thanmore linear trees, however, any spanning tree will provide a validroute.

c. Locating DX Crossovers

The methods include using the spanning tree to identify the route of thescaffold sequences through the target structure. For example, themethods can identify the location of anti-parallel DX cross-overs withinthe target structure by classifying each edge.

Determination of a spanning tree including all nodes of the networkenables the identification of edges that are within the spanning treeand edges that are not within the spanning tree. Therefore, the methodsinclude identifying edges that are within the spanning tree and edgesthat are not within the spanning tree. Edges within a spanning treerepresent continuous stretches for the route of the single-strandednucleic acid scaffold in both directions (i.e., 5′-3′ and 3′-5′). Edgesnot within a spanning tree include anti-parallel DX cross-over motifs.Therefore, for each edge that is not in the spanning tree, a pair ofpseudo-nodes is added to split the edge into two halves, eachcorresponding to one side of a scaffold crossover. At each anti-parallelDX cross-over motif, the single-stranded nucleic acid scaffold reversesthe direction it travels along.

The methods include assigning anti-parallel DX cross-over motifs at thecenter of each edge that is not within a spanning tree. Because a singlescaffold crossover is assigned to each edge that is not within aspanning tree, and edges with zero scaffold crossovers must connect toevery vertex, there can be no cycles of edges with zero scaffoldcrossovers, meaning that there are V−1 edges with zero scaffoldcrossovers, where V is the number of vertices, and the rest have onescaffold crossover.

Locating the DX crossovers within each possible spanning treecorresponds to a unique scaffold routing.

d. Identification of the Euler Circuit

The path of the single-stranded nucleic acid scaffold is defined as theEuler Circuit of the node-edge network. Therefore, the methods includeconverting the spanning tree into an Eulerian circuit. Converting thespanning tree into an Eulerian circuit includes (1) adding a pair ofpseudo-nodes to each edge that is identified as including a DXcrossover; (2) adding a set of pseudo-nodes at each vertex in the graph,so that each edge is bounded on both ends by pseudo-nodes; (3) removingthe original vertex nodes; and (4) defining the Eulerian circuit throughwhich the continuous scaffold strand will be routed.

Typically, a vertex of degree N has N edges emerging from it. AnEulerian circuit is guaranteed when the degree of every vertex is even.Therefore, the methods include creating a scaffold route for which thedegree of every vertex in the node-edge network is even. The EulerianCircuit of the planar graph passes through each of the edges once ineach direction.

Therefore, the Eulerian circuit defined by the methods passes twicealong each edge of the spanning tree. The route of the scaffold strandidentified by the methods ensure (1) the scaffold strand always enters avertex from a first edge and exits the vertex from an adjacent edge thatshares a face with the first edge; and (2) the scaffold strand does notenter and exit a vertex from the same edge. Therefore, the scaffoldrouting process produces only edges that are connected to the vertex.The scaffold routing process does not allow for the intersection of DNAstrands. Therefore, the methods provide a scaffold route that does notinclude internal scaffold loops that are disconnected from the rest ofthe scaffold.

The subset of Eulerian circuit that defines the route of thesingle-stranded DNA scaffold sequence through the entire polyhedralstructure is defined as the subset of Eulerian circuits known asA-trails.

The direction of the scaffold is chosen to run counterclockwise aroundeach face, so that for convex vertices (the majority of cage vertices)the major grooves of the duplexes at each vertex point inward tominimize electrostatic repulsion of the backbone. Therefore, the methodsinclude converting the undirected graph into a directed graph toimplement this directional choice.

e. Identifying Scaffold Routing for the Target Structure

Automated sequence design can be performed by first representing thetarget structure as a polyhedral mesh. Each edge is composed of multiplehelices, so that the graph of the mesh is modified to represent thesehelices as multiple lines. These endpoints are then joined so that everyduplex becomes part of a loop. By choosing a particular subset of thesedouble crossovers, these discrete loops can connect to form onecontinuous Eulerian circuit through the entire structure, creating thescaffold routing. The spanning tree of this dual graph is then computed,and the edges that are members of the spanning tree correspond to thesubset of crossovers required to complete the Eulerian circuit.Inverting the spanning tree of the dual graph back to the loop-crossoverstructure reveals the final scaffold routing. Therefore, the methodsinclude converting the undirected graph into a directed graph toimplement this directional choice.

f. Identifying the Sequence of the Single-Stranded Nucleic Acid Scaffoldand Staple Sequences

The methods include the identification of the nucleic acid sequences ofstaples corresponding to the sequence of the single-stranded nucleicacid scaffold.

The length of the scaffold sequence is determined from the Euleriancircuit calculated from the input geometry, modified according to theinput size, for example, as determined by the user-defined size of oneor more of the edges of the structure. Typically, the sequence of thescaffold is based on a template sequence, for example, a user-definedsequence, or a known sequence, such as a bacteriophage sequence (e.g.,M13mp18). If the sequence length required to provide the desiredstructure according to the methods is smaller than that of the defaultsequence, a subset of the default sequence will be output.Alternatively, if the sequence length required to provide the desiredstructure according to the methods is larger than that of the defaultsequence, a sequence will be generated.

The methods include the placement of all staple sequences. After all thestaples are placed, each staple is converted to a vector of numbers,each value corresponding to the scaffold nucleotide to which it is basepaired. Then, the input or generated scaffold sequence is used, matchinga base identity (A, T, G, or C) to a scaffold number. If no sequence isprovided, a segment of M13pm18 is used by default if the requiredscaffold length is less than 7249 nucleotides, and a sequence israndomly generated if the required length is greater. The complementarynucleotide via Watson-Crick base pairing is then be computed andassigned to the corresponding staple nucleotides. Finally, this list ofstaple sequences is output for synthesis.

i. Orientation of Scaffold Sequence

The methods combine the user-defined desired size (i.e., edge-length)with the spanning tree and pseudo-node addition to determine a scaffoldsequence.

The Eulerian circuit is used to identify a scaffold nick position. Thescaffold is nicked at a position located on an edge without scaffoldcrossovers that is located on the duplex at a distance from the DXcrossovers. Using Prim's formula, this edge will have Vertex #1 as oneof its endpoints, since with the most-branching default all edgesconnected to Vertex #1 are members of the spanning tree. Marking this5′-end as scaffold base #1, each of the scaffold bases are subsequentlynumbered with knowledge of the edge lengths and routing scheme, allwhile keeping track of their relative position on their edge.

The scaffold is designed to ensure that all staple and scaffoldcrossovers remain perpendicular to the helical axes. Therefore, thescaffold is designed to ensure the 5′ end overhangs the 3′end by onenucleotide for each edge. The half-edges, namely those edges that aresplit by the scaffold crossover, have lengths that are pre-determined bysome simplifying assumptions. The scaffold crossover is placed as closeto the center as possible, with a convention set here to have apreference towards the lower-index vertex if needed. Therefore, themethods determine how long a particular section of a scaffold is on agiven edge.

The methods ascribe two pieces of information to each nucleic acid basewithin the scaffold: (1) an index number to indicate its position on thescaffold strand; and (2) a set of numbers to indicate its spatiallocation, including the edge, the duplex, and the position from the 5′end.

Typically edges are numbered according to their order within theEulerian circuit, starting from the position of the 5′ nick.

ii. Placement of Staple Strands

The methods identify the routing of the staple strands based on thespatial location, including the edge, the duplex, and the position fromthe 5′ end. For example, information contained within the set of numbersthat indicate the spatial location, including the edge, the duplex, andthe position from the 5′ end, is used to identify which bases in thestaples are paired with which bases in the scaffold, then the formerindex number is assigned to the staples accordingly.

Typically, the number of staple strands varies depending upon thecomplexity of the structure. For structures with small scaffold strandsthat are of minimal complexity, such as simple tetrahedra, cubes, etc.,the number of staple strands is typically about 5, 10, 50 or more than50. For longer scaffold strands (e.g., greater than 1500 bases) and/ormore complex structures, the number of staple strands can be severalhundreds to thousands. For example, in some embodiments, the number ofstaple strands is up to 50, 100, 300, 600, 1,000 or more than 1,000.

There are three categories of staple strands, each with their ownprescribed pattern: staples on vertices, staples on edges with scaffoldcrossovers, and staples on edges without scaffold crossovers.

The methods include a minimum edge length of 31 bp. A 31/32-bp edge has21 bp occupied by vertex staples, leaving 10 or 11 bp for edge staples.Therefore, in both types of edges, a 20- or 22-bp staple is placed witha single crossover on one side, because a staple nick in the middlewould conflict with the scaffold crossover. Therefore, the methodsinclude a double-crossover vertex staple design in any structure with a31- or 32-bp edge present.

The pattern of staple routing depends on the degree of the vertex,ensuring that each staple length is 52- or 78-nucleotides (nt) long forease of synthesis.

$\begin{matrix}{a = \left\{ \begin{matrix}{0,} & {{{if}n{mod}3}\  = 0} \\{2,} & {{{if}n{mod}3}\  = 1} \\{1,} & {{{if}n{mod}3}\  = 2}\end{matrix} \right.} & \left( {{Eq}.1} \right)\end{matrix}$ $\begin{matrix}{b = \frac{n - {2a}}{3}} & \left( {{Eq}.2} \right)\end{matrix}$

Where

a is the number of 52-nt staples at the vertex,

b is the number of 78-nt staples at the vertex, and

n is the degree of the vertex.

(i). Staples on Vertices

The staples on vertices pair with the first 10-11 nucleotides of eachduplex abutting the vertex, with poly-T bulges of length 5 crossingbetween edges. There are two varieties of vertex staple designsimplemented: one system uses single crossovers in some places to ensurethat there is 10-11 bp of continuous duplex for high specificity andbinding strength, and the other, more traditional, system uses doublecrossovers everywhere, leading to a minimum of 5 bp of continuousduplex. For the structures synthesized and characterized in this work,the former paradigm is used, as the higher binding strength was found tocreate a more cooperative transition at a higher temperature (FIGS.9A-9L). The pattern of staple routing depends on the degree of thevertex, ensuring that each staple length is 52- or 78-nucleotides (nt)long for ease of synthesis.

(ii). Staples on Edges with Scaffold Crossovers

The edge staples pair with the intermediate nucleotides between vertexstaples. For the edges with scaffold crossovers, two 31-32-nt staplesare placed across the scaffold crossover, together occupying a 15-16-ntregion on either side of the crossover for sufficiently strong binding.The remainder of scaffold has 42-nt staples placed to create staplecrossovers every 21 base pairs, with a 20- or 22-nt staple in the caseof a 10- or 11-nt remainder.

(iii). Staples on Edges without Scaffold Crossovers

The edges without scaffold crossovers follow the same pattern, fillingwith as many 42-nt staples that can fit and using a 20- or 22-nt staplewhen necessary.

g. Output of Staple Sequences

The methods provide the nucleic acid sequences of staple strandscorresponding to the desired target sequence, edge size(s) andoptionally a template nucleic acid sequence.

After all the staples are placed according to the methods, each stapleis a vector of numbers, each value corresponding to the scaffoldnucleotide to which it is base paired. Then, the input or generatedscaffold sequence is used, matching a base identity (A, T, G, or C) to ascaffold number.

If no sequence is provided, a default sequence is used. For example, insome embodiments, if the required scaffold length is less than 7249nucleotides, a segment of M13pm18 nucleic acid sequence is used. Inother embodiments, a sequence is randomly generated. The methodsdetermine complementary nucleotides via Watson-Crick base pairing andassign sequences to the corresponding staple nucleotides. Typically, themethods produce this list of staple sequences as output. Therefore, insome embodiments the methods also include the step of synthesizing thestaple sequences. In some embodiments the methods include the step ofsynthesizing the scaffold sequence. In some embodiments, the methodsinclude the step of synthesizing the scaffold sequence and the staplesequences. Therefore, the methods include converting the undirectedgraph into a directed graph to implement this directional choice.

h. Scaffold Sequence Output based on User-Defined Staples

Methods to generate staple strand sequences given a scaffold sequencecan be inverted, so that the user provides staple strand sequences thatare used to generate a scaffold sequence.

The methods for custom-design of a nanostructure having desiredgeometric parameters can also be used to determine the nucleic acidsequence of a scaffold sequence that will fold into the desired shapebased on hybridization with one or more user-defined staple sequences.Therefore, in some embodiments, the methods provide the nucleic acidscaffold sequence, based on the input of user-defined staple strands,desired target structure and optionally edge size(s).

The methods provide a custom scaffold sequence that based onuser-defined staple sequences. Typically, the number and size ofscaffold sequences that are required by the user will vary according tothe desired geometry of the nanostructure. In some embodiments, at leastone, two or three staple sequences are required as input. In certainembodiments, one or more staple sequences are required as input, and themethods provide the sequence(s) of one or more remaining, or undefinedstaple sequences.

2. Nanostructures with Parallel Crossover Motifs

In some embodiments, the methods require including at least one edgehaving one “PX” (parallel paranemic scaffold crossover) motif.Therefore, in some embodiments, there are two double helices per edgeoriented in parallel vertically, that is, one of the duplexes is closerto the interior of the object than the other. In some embodiments, thescaffold cannot be an arbitrary sequence, because self-hybridizationmust occur to complete the structure. Self-hybridizing regions replacethe need for staple strands, so in some embodiments one nucleic acidstrand can fold and hybridize to itself to form an origami nanostructurewithout any other oligonucleotides.

The scaffold strand is routed by a method that identifies the Euleriancircuit through the entire network, such that the strand enters eachvertex from a first edge and exits the vertex from an adjacent edge thatshares a face with the first edge. The route of the scaffold strand isdetermined according to the rules that the scaffold strand does notenter and exit from the same edge, and the scaffold strand does not exitfrom an edge that is not-adjacent to the edge it enters. Therefore, thescaffold routing process does not allow for the intersection of DNAstrands and the process produces only edges that are connected to thevertex.

Each of the steps involved in determining the route of thesingle-stranded nucleic acid scaffold is described in more detail,below.

a. Determination of the Node-Edge Network

In some embodiments, the wire-frame model of a desired polyhedralstructure is rendered as a node-edge network. Typically, the nodes andedges of the network correspond to the vertices and lines of thepolyhedron. In certain embodiments, a node-edge network corresponding toa structure can be represented by the planar graph of the correspondingpolyhedron, or by other means. For example, in some embodiments theplanar graph of the corresponding polyhedron is a Schlegel diagram. TheSchlegel diagram is a projection of the desired polyhedral form fromR^(d) into R^(d-1) through a point beyond one of its facets or faces.The resulting entity is a polytopal subdivision of the facet in R^(d-1)that is combinatorially equivalent to the original polyhedral form.Formulas and methods for generating a Schlegel diagram of a polyhedralform are known in the art. In other embodiments, a node-edge network iscalculated for a corresponding structure without the use of a planargraph.

Therefore, in some embodiments, the methods include the step ofproviding a node-edge network of the target structure. Typically, eachof the vertices corresponds to a node in the network, and each linebetween any two vertices represents an edge in the network.

b. Creating a Spanning Tree

In some embodiments, the node-edge network is used to establishconnectivity amongst all of the vertices. An exemplary representation ofconnectivity through the node-edge network is by producing one or morespanning trees. The spanning tree is the set of edges that connect allnodes within the network without circuits. In some embodiments, thespanning tree is determined using one or more formulas. Formulas fordetermining the spanning tree for a network are known in the art. Anexemplary method for determining the spanning tree for the node-edgenetwork corresponding to the desired shape is Prim's Formula. Therefore,in some embodiments, identifying scaffold routing includes creating oneor more spanning trees for the node-edge network. In certainembodiments, the spanning tree is the spanning tree produced using amaximum-breadth search. If, as in this case, all edges are weighted thesame, Prim's formula will generate a breadth-first search spanning tree,one with the most branches. Therefore, in some embodiments, identifyingscaffold routing includes the selection of one or more spanning treesthat have the most branches.

It has been shown that branching trees self-assemble more reliably thanmore linear trees, however, any spanning tree will provide a validroute.

c. Classifying Edges

The methods include using the spanning tree to classify the edges,culminating in the final Eulerian circuit the scaffold strand takesthrough the target structure.

There are four classifications the edges can have, based on choosingbetween two options for two traits. One trait is the crossover motif ofthe edge. Each edge can employ either anti-parallel (DX) or parallel(PX) crossovers. The second trait is determined by membership in thespanning tree. Edges that are members of the spanning tree must haveeach scaffold fragment, that is, the portion of the scaffold strandwithin the edge, start and end at different vertices. Edges that are notmembers of the spanning tree must have each scaffold fragment start andend at the same vertices. Note that this is an extension of theclassification used for the two-helix-per-edge DX structures; theclassifications and choice of scaffold crossover location follow thesame start and end rules as described above.

d. Superimposing and Connecting Edges

Based on the classification (crossover motif, spanning tree membership)and the length of the edge, a set of scaffold fragments, and in someembodiments, staple strands, with routing within the edge alreadydetermined, is superimposed on the edge. In some embodiments, this isrepresented by an M×4 matrix, where M is the length of the edge, andeach of the four columns represents one strand, e.g. Column 1 representsthe nucleotides 3′ to 5′ from the vertex at the top to the vertex at thebottom in the duplex closer to the interior of the object, Column 2represents the nucleotides 5′ to 3′ from the vertex at the top to thevertex at the bottom in the interior duplex, Column 3 represents thenucleotides 5′ to 3′ from the top vertex to the bottom vertex in theduplex closer to the exterior of the object for PX edges and 3′ to 5′for DX edges, and Column 4 represents the nucleotides 3′ to 5′ from thetop vertex to the bottom vertex in the exterior duplex for PX edges and5′ to 3′ for DX edges. Nucleotides in Columns 1 and 2 are complementaryvia Watson-Crick base pairing, and nucleotides in Columns 3 and 4 arecomplementary in the same manner. Nucleotides in the same row are thesame interpolated distance between the two vertices.

In some embodiments, the elements of the matrix determine the route ofthe scaffold and enforce the crossover motif; for PX edges, themajor/minor groove pattern is also enforced. Elements that areconsecutive in number, e.g., 4 and 5, or i and i+1, representnucleotides that share a covalent phosphodiester bond, and elements thatare in the same row and are in paired columns (1 and 2, 3 and 4) arebase paired. For PX edges, the major/minor groove pattern is the numberof bases that lie in the major and minor grooves of the double helix. Insome embodiments, the number of bases in a major groove can be less than5, 5, 6, 7, 8, 9, or more than 9, and the number of bases in a minorgroove can be less than 4, 4, 5, 6, or more than 6. The major/minorgroove pattern also determines where parallel crossovers can occur. Insome embodiments, this is reflected in the matrix as when consecutivenucleotides are not in the same column, e.g. nucleotide 4 is in Column 1and nucleotide 5 is in Column 4.

When all of the edges have been superimposed, the first and last rows ofColumns 1 and 2 of each edge matrix represent the 5′ and 3′ ends thatmust be joined to neighboring edges at the vertex. The connection isenforced by updating each nucleotide's number to uniquely identify itsposition in the complete scaffold strand, maintaining that consecutivenumbers indicate connection along the phosphodiester backbone.

e. Identifying the Sequence of the Single-Stranded Nucleic Acid Scaffoldand Staple Sequences

The methods include the identification of the nucleic acid sequences ofscaffold and staples corresponding to the hybridization pattern set bythe routing described above.

In regions of parallel crossovers, the sequence must be customized suchthat Watson-Crick base pairing is followed. In regions of anti-parallelcrossovers, the scaffold sequence can be arbitrary, and the staplesequences that hybridize to it must follow Watson-Crick base pairing.

In some embodiments, the scaffold nick is chosen to be placed at the endof a farther-from-center duplex. This may be on PX or DX edge. The 5′end of the nick is marked as base #1, and the 3′ end is the last base ofthe scaffold. Some scaffold nucleotides may be part of hairpin loops anddo not have bases paired to them; the numbering of the scaffold strandremains unchanged, but these regions may be marked as single-strandednucleic acid strands.

For these custom sequences, in some embodiments a random numbergenerator choosing between 1 and 4 inclusive, which can map to A, C, G,T for DNA and A, C, G, U for RNA can produce the sequences of one memberof each base pair, and its partner's sequence is found via canonicalWatson-Crick base pairing. If certain staple sequences are to beincorporated, perhaps for example if they have been functionalized andneed to bind to the larger origami structure, then those sequences ofthose regions are determined from the target staple sequences.

With this, the methods ascribe (1) an index number to indicate itsposition on the scaffold strand; and (2) a set of numbers to indicateits spatial location, including the edge, the duplex, and the positionfrom the 5′ end.

f. Placement of Staple Strands

In edges with anti-parallel crossovers, staples may be necessary tobring together the portions of scaffold within the edge. In someembodiments, the superimposed edges contain regions where the stapleslie based on their numbers being non-consecutive with the rest of thebases in the edges. In this embodiment, vertex staples are not requiredbecause only one duplex from each edge meets at the vertex.

g. Output of Staple and/or Scaffold Sequences

The methods provide the nucleic acid sequences of scaffold and staplestrands corresponding to the desired target edge size(s) and geometry.Unlike the embodiment that only contains DX motifs, the scaffoldsequence is, in part or in whole, a custom sequence.

Based on the nucleotide sequences generated in the previous steps, themethods typically produce this list of staple sequences and scaffoldsequence as output. Therefore, in some embodiments the methods alsoinclude the step of synthesizing the staple sequences. In someembodiments the methods include the step of synthesizing the scaffoldsequence. In some embodiments, the methods include the step ofsynthesizing the scaffold sequence and the staple sequences.

C. Assembling Nucleic Acid Nanostructures

Typically, following design according to the described methods, thenucleic acid nanostructures are synthesized, folded and purified priorto structural validation. Therefore, methods for the design of nucleicacid nanostructures having a desired form optionally include the step ofproducing the nucleic acid nanostructure. In some embodiments, producingthe nanostructure includes synthesizing nucleic acids having thesequence of the scaffold and staples according to the designed form;hybridizing the staple sequences to the scaffold; folding thenanostructure; purifying the nanostructure; performing structuralanalysis of the nanostructure; validating the structure; andcombinations.

1. Production of Nucleic Acid Nanostructures

The methods provide the nucleic acid sequences of the single-strandedscaffold and the oligonucleotide staple sequences that can be combinedto form complete three-dimensional nucleic acid nanostructures of adesired form and size. Typically, the methods convert the informationprovided as geometric parameters corresponding to the desired form andthe desired dimensions into the sequences of oligonucleotides that canbe synthesized using any means for the synthesis of nucleic acids knownin the art.

a. Single-Stranded Scaffold DNA Sequence

Scaffold nucleic acid sequences and oligonucleotide staple sequences canbe synthesized or purchased from numerous commercial sources. In someembodiments, the scaffold nucleic acid sequence is the M13mp18single-stranded DNA scaffold. The M13mp18 ss DNA can be purchased frommultiple commercial sources, including New England Biolabs (Cat #N4040S)or from Guild Biosciences for various M13mp18 size.

Typically, scaffold DNA of the desired length is produced usingpolymerase chain reaction (PCR) methodologies. Standard methods for PCRare known in the art. In some embodiments, the nucleic acidnanostructures are produced using asymmetric PCR (aPCR). When aPCRamplification is used, oligonucleotide primers can be designed togenerate many different scaffold lengths. Therefore, in someembodiments, the scaffold having a desired length is produced using oneor more custom oligonucleotides. When the template scaffold nucleic acidis known, a set of known oligonucleotides can be used. For example, whenthe scaffold nucleic acid is the M13mp18 ssDNA, the primers in Table 2can be used to design scaffolds of desired lengths. In some embodimentsmodified dNTPs (examples of modified dNTPs include, but are not limitedto dUTP, Cy5-dNTP, biotin-dNTPs, alpha-phosphate-dNTPs) are used foramplification of the ssDNA scaffold. In other embodiments the templateuse is the Lambda phage that can be purchased from different commercialsources, including New England Biolabs (Cat #N3011S). In otherembodiments, the nucleic acid nanostructures are produced usingdigestion of the template DNA to form a scaffold nucleic acid of thedesired length. In certain embodiments, a combination of PCR anddigestion methods is used to produce scaffold single-stranded nucleicacid of the desired length.

When nucleic acid scaffold sequences are required to be synthesized, thescaffolds can be synthesized using the asymmetric PCR, for example,using GBLOCK® DNA commercially available from Integrated DNATechnologies as a template.

2. Assembly of Nanostructures

The methods include assembly of the single-stranded nucleic acidscaffold and the corresponding staple sequences into the nanostructureof the desired shape and size. Typically, the assembly is carried out byhybridization of the staples to the scaffold sequence. Therefore, insome embodiments, the nucleic acid nanostructures are assembled by DNAorigami annealing reactions. For example, the oligonucleotide staplesare mixed in the appropriate quantities in an appropriate reactionvolume. In preferred embodiments, the staple strand mixes are added inan amount effective to maximize the yield and correct assembly of thenanostructure. For example, in some embodiments, the staple strand mixesare added in molar excess of the scaffold strand. In an exemplaryembodiment, the staple strand mixes are added at a 10-20× molar excessof the scaffold strand.

Annealing can be carried out according to the specific parameters of thestaple and scaffold sequences.

3. Purification of Nucleic Acid Nanostructures

The methods include purification of the assembled nucleic acidnanostructures. Purification separates assembled structures from thesubstrates and buffers required during the assembly process. Typically,purification is carried out according to the physical characteristics ofnanostructures. For example, the use of filters and/or chromatographicprocesses (FPLC, etc.) is carried out according to the size and shape ofthe nanostructures.

In an exemplary embodiment, nucleic acid nanostructures are purifiedusing filtration, such as by centrifugal filtration, or gravityfiltration. In some embodiments, filtration is carried out using anAmicon Ultra-0.5 mL centrifugal filter (MWCO 100 kDa).

Following purification, nucleic acid nanostructures can be placed intoan appropriate buffer for storage, and/or subsequent structural analysisand validation. Storage can be carried out at room temperature (i.e.,25° C.), 4° C., or below 4° C., for example, at −20° C. Suitable storagebuffers include PBS, TAE-Mg²⁺ or DMEM.

D. Predicting 3D Structure

Methods for designing nucleic acid nanostructures of a desired shape andsize can include steps for validation of the resulting nucleic acidstructure based on the output sequences. For example, in someembodiments, the methods also include the step of predicting the3-dimensional coordinates of the nucleic acids within the nucleic acidnanostructure, based on the output of the system used for positioningscaffold and, when present, staple sequences. When structuralinformation for a nucleic acid nanostructure is predicted, the predictedinformation can be used to validate the nucleic acid nanostructure.Typically, validation of the resulting nucleic acid structure includes(1) calculating the positions of each base pair in the structural model;(2) determining the positions of each base pair in the nucleic acidnanostructure; and (3) comparing the calculated structural data obtainedfor the model with that experimentally determined (i.e., observed) forthe nanostructure.

1. In Silico Modelling of Structural Data

The 3-dimensional coordinates of a nucleic acid base pair can becalculated by any means known in the art. In a preferred embodiment, thepositions of each base pair in the structural model are calculated usingcomputational modelling. Therefore, in some embodiments, in silicomodelling is used to predict the three-dimensional structural featuresof a polyhedral nucleic acid nanostructure designed from a target modelaccording to the methods. The parameters used for modelling the3-dimensional coordinates of a nucleic acid base pair of a givennanostructure designed according to the described methods are determinedbased upon the presence of antiparallel or parallel crossovers withinthe structure.

a. In Silico Modelling of Nanostructures Including AntiparallelCrossovers

In some embodiments, in silico modelling is used to predict thethree-dimensional structural features of a polyhedral nucleic acidnanostructure including anti-parallel cross-overs, designed from atarget model according to the described methods.

When in silico modelling is used to predict structural features ofnucleic acid nanostructures including antiparallel crossovers, in silicomodelling can be used to predict the position of each base pair in thestructural model by interpolating between the two ends of the edge itresides on, and shifting away perpendicularly from the central axis by10 Å, half the inter-helical distance for an anti-parallel crossover.The edge is assumed to lie in a plane with a normal vector defined bythe sum of the unit normal vectors of the two neighboring faces. Thereare several ways to define the location of the ends of the edges. TheDX-tile edges can be assumed to be two parallel cylinders with combinedwidth 40 Å (20 Å inter-helical distance and 20 Å duplex diameter). Thiscan be further simplified to a rectangle with width 40 Å, with the lineof the edge serving as a central axis. In the ideal case, the corners ofthese rectangles meet, since the scaffold exits and enters the edge fromthese locations. The widths of the rectangles together would form anN-sided regular polygon, because they have the same sides and have equalangles between them. The perpendicular distance from the center of thispolygon and an edge (the beginning of the interpolation) is the inradiusof this polygon. From the inradius, the distance between the vertex andthe beginning of the DX-tile edge is determined using the sum of theface angles. If the multi-arm DX-tile were flat, this would beequivalent to the inradius.

$\begin{matrix}{s = {\frac{2\pi}{\theta_{tot}}r}} & \left( {{Eq}.3} \right)\end{matrix}$

where s is the distance between the vertex and the beginning of theDX-tile edge,

r is the inradius of the polygon formed by the widths of the tiles, and

θ_(tot) is the sum of all face angles at the vertex.

For regular N-sided polygons,

$\begin{matrix}{r = {\frac{w}{2}{\cot\left( \frac{\pi}{N} \right)}}} & \left( {{Eq}.4} \right)\end{matrix}$

where w is the combined width of the DX-tile (40 Å).

In some embodiments, in silico modelling is used to predict theco-ordinates of nucleic acids within structures whose edges do not meetat regular angles. Exemplary structures whose edges do not meet atregular angles include the Archimedean solids. In that case, dependingon the convention used to define the length of the inradius, there willbe backbone stretches or nucleotide overlaps. For the cuboctahedron, arepresentative Archimedean solid, the size of the object is best fitwhen backbone stretches are minimized, where the inradius is calculatedbased on the largest face angle.

$\begin{matrix}{r = {\frac{w}{2}{\cot\left( \frac{\theta_{\max}}{2} \right)}}} & \left( {{Eq}.5} \right)\end{matrix}$

where

θ_(max) is the largest face angle. Note that this general equationapplies to regular N-sided polygons as well, since θ_(max)=2π/N.

For structures with concave vertices, where θ_(tot)>2π, to obey theconvention that all edge axes meet at a single point, s=r is defined,creating a sphere of radius r that defines the edge boundaries.

b. In Silico Modelling of Nanostructures Including Parallel CrossoversScaffold Sequence Output Based on User-Defined Staples

In some embodiments, in silico modelling is used to predict thethree-dimensional structural features of a polyhedral nucleic acidnanostructure having parallel crossover motifs, designed from a targetmodel according to the described methods. For example, in silicomodelling can be used to predict the position of each base pair in thestructural model by interpolating between the two ends of the edge itresides on. If the base pair is part of the interior duplex of the edge,no shifting is necessary; if the base pair is part of the exteriorduplex, the position is shifted away along the outward normal of theedge by 20 Å, the inter-helical distance. There are several ways todefine the location of the ends of the edges, which are the 5′ and 3′ends of the interior duplex. The interior duplex can be assumed to be acylinder with diameter 20 Å. This can be further simplified to arectangle with width 20 Å. In the ideal case, the corners of theserectangles meet at the vertex since the scaffold exits and enters theedge from these locations. The widths of these rectangles together wouldform an N-sided regular polygon, because they have the same sides andhave equal angles between them. The perpendicular distance from thecenter of this polygon and an edge (the beginning of the interpolation)is the inradius of this polygon.

Calculating the inradius r and the distance between the vertex and thebeginning of the interior duplex s follows the same procedure asdescribed with Eq. 3 to 5, above, except w in this case is the diameterof a duplex, (e.g., 20 Å), instead of the width of a DX-tile (e.g., 40Å).

2. Validation of Observed Structural Data

For validation, the predicted three dimensional model for a givenstructure is used as a comparison with the experimentally determinedstructural data. For example, the in silico prediction of structure(s)for a given input shape, size and optionally a nucleic acid sequence canbe compared with actual structural data. Therefore, the methods caninclude the step of using data obtained by in silico modelling of avirtual structure to validate the structural parameters of a nucleicacid nanostructure designed and synthesized according to the methods. Incertain embodiments, a virtual structure prepared by in silico modellingis used as a control for the design and synthesis methods.

Actual structural data corresponding to a nucleic acid nanostructureproduced according to the methods can be obtained using any method knownin the art. Exemplary methods for acquiring and analyzing biophysicaldata for macromolecular structures include X-ray crystallography,Nuclear Magnetic Resonance (NMR), Cryo-electron microscopy, Atomic ForceMicroscopy, Light Microscopy, Small-angle X-ray diffraction, CircularDichroism, Analytical Ultracentrifugation, chromatographic methods, andcombinations.

In some embodiments, differences between the in silico prediction ofstructural features and actual structural features identify structuraldeviations, etc.

III. Systems

A. Computer Implemented Systems

The systems and methods provided herein are generally useful forpredicting the design parameters that produce a nucleic acidnanostructure having a desired polyhedral shape. In some embodiments,the geometric parameters corresponding to the desired form and thedesired dimensions are input using a computer-based interface thatallows for the design process to be carried out in a completelyin-silico manner. For example, in certain embodiments, the methods areimplemented in computer software, or as part of a computer program thatis accessed and operated using a host computer. In other embodiments,the methods are implemented on a computer server accessible over one ormore computer networks.

FIG. 1 depicts the work flow of methods that can be implemented. In someembodiments a user accesses a computer system that is in communicationwith a server computer system via a network, i.e., the Internet or insome cases a private network or a local intranet. One or both of theconnections to the network may be wireless. In a preferred embodimentthe server is in communication with a multitude of clients over thenetwork, preferably a heterogeneous multitude of clients includingpersonal computers and other computer servers as well as hand-helddevices such as smartphones or tablet computers. In some embodiments theserver computer is in communication, i.e., is able to receive an inputquery from or direct output results to, one or more laboratoryautomation systems, i.e., one or more automated laboratory systems orautomation robotics that automate biochemical assays, PCR amplification,or synthesis of PCR primers. See for example automated systems availablefrom Beckman Coulter.

The computer server where the methods are implemented may in principlebe any computing system or architecture capable of performing thecomputations and storing the necessary data. The exact specifications ofsuch a system will change with the growth and pace of technology, so theexemplary computer systems and components should not be seen aslimiting. The systems will typically contain storage space, memory, oneor more processors, and one or more input/output devices. It is to beappreciated that the term “processor” as used herein is intended toinclude any processing device, such as, for example, one that includes aCPU (central processing unit). The term “memory” as used herein isintended to include memory associated with a processor or CPU, such as,for example, RAM, ROM, etc. In addition, the term “input/output devices”or “I/O devices” as used herein is intended to include, for example, oneor more input devices, e.g., keyboard, for making queries and/orinputting data to the processing unit, and/or one or more outputdevices, e.g., a display and/or printer, for presenting query resultsand/or other results associated with the processing unit. An I/O devicemight also be a connection to the network where queries are receivedfrom and results are directed to one or more client computers. It isalso to be understood that the term “processor” may refer to more thanone processing device. Other processing devices, either on a computercluster or in a multi-processor computer server, may share the elementsassociated with the processing device. Accordingly, software componentsincluding instructions or code for performing the methodologies of theinvention, as described herein, may be stored in one or more of theassociated memory or storage devices (e.g., ROM, fixed or removablememory) and, when ready to be utilized, loaded in part or in whole intomemory (e.g., into RAM) and executed by a CPU. The storage may befurther utilized for storing program codes, databases of genomicsequences, etc. The storage can be any suitable form of computer storageincluding traditional hard-disk drives, solid-state drives, or ultrafastdisk arrays. In some embodiments the storage includes network-attachedstorage that may be operatively connected to multiple similar computerservers that comprise a computing cluster.

1. Preparation of Nanostructure Libraries

In some embodiments, nanostructure libraries are designed by automatedmethods. Automated design programs for generating DNA nanostructuresallow for a diverse set of geometries to be made, towards the synthesisof a library of objects for applications as diverse as nano-casting,delivery, and structural scaffolding. Libraries of DNA nanostructureswith diverse sequences and geometries are also useful for diverseapplications in memory storage, biomaterials synthesis, controllednanoscale bioreactors, excitonic materials discover, vaccinedevelopment, and therapeutic delivery including cancer immunotherapy.For example, in some embodiments, a library or libraries of nucleic acidnanostructures can be constructed with single-strand bait sequencescomplementary to one or more target molecules. In an exemplaryembodiment, the single-strand bait sequences include sequences that arecomplementary to one or more loops of a target RNA.

a. Hi-throughput Production of Nanostructures and Modifications

Systems for the generation of libraries of nanostructures includingdifferent modifications can be implemented using automated methods. Forexample, the methods can provide the sequences of short single-strandedoligonucleotides staple strands of approximately 10-1,000 nucleotidesthat include “bait” sequences that are complementary in sequence to aregion, or regions of a target molecule. In some embodiments the targetmolecules include RNAs, DNAs, PNAs, LNAs, proteins, lipids,carbohydrate, small molecules, etc. In an exemplary embodiment, thetarget molecule is a ribonucleic acid. Typically, target moleculesinteract with bait sequences on nanostructures via covalent ornon-covalent linkage to the bait sequence. Exemplary linkages includeeither chemical conjugation via nucleic acid overhangs with clickchemistry/other groups, or hybridization forces. When these staplestrands are incorporated into the nanostructures, their position isdefined by the design as part of the formation of the nanostructure,where the 5′ end of the staple meets the 3′ end of itself or anotherstaple. Therefore, methods for creating libraries of polyhedral nucleicacid nanostructures for capturing one or more target molecules areprovided. In some embodiments, the in silico design of polyhedralnucleic acid nanostructure libraries includes defining ranges for thedesired properties of nanostructures within the library pool. Exemplaryinput ranges include minimum and maximum values for values such as size,vertex geometry, as well as spatial arrangement, and sequence diversityof bait sequences for capturing target molecules.

Typically, computational systems are applied to automate sequencedesigns of a diverse set of DNA nanostructures. DNA nanostructures varyin many ways, including in object geometry (as shown in FIG. 2A and FIG.13 ), edge lengths between vertices, staple nick positions along eachedge (including nicks either inwardly facing or outwardly facing fromthe object), bait sequence orientation on the object (into or out fromthe edge tile), and the set of bait sequences for capturing the RNA.Different types of edges can provide distinct orientations of baitsequences. Alternative structured DNA assemblies include bricks, brickswith holes or cavities, assembled using DNA duplexes packed on square orhoneycomb lattices (Douglas et al., Nature 459, 414-418 (2009); Ke Y etal., Science 338: 1177 (2012)). Paranemic-crossover (PX)-origami, inwhich the nanostructure is formed by folding a single long scaffoldstrand onto itself, can alternatively be used, provided bait sequencesare still included in a site-specific manner. Further diversity can beintroduced such as using different edge types, including 6-, 8-, 10, or12-helix bundle. Further topology such as ring structure is also useablefor example a 6-helix bundle ring.

Generally, the object geometry, edge length, and sequence topologydictate the scaffold and core staples, which are staple strands found ineach class of nanostructure but are not functionalized in the library.

Generally, the high-throughput library generation of structured DNAorigami assemblies is achieved via multiple automated steps. Automateddesign program for generating DNA nanostructures allows for a diverseset of geometries to be made (FIG. 2A and FIG. 13 ), towards thesynthesis of a library of objects for applications as diverse asnano-casting, delivery, and structural scaffolding.

In some embodiments, a computational approach to generate a set ofgeometric objects with specific 3D overhangs complementary tosingle-stranded loops of HIV RNAs, seeking maximum coverage of Euclidianspace by the overhangs, to allow for the most number of objects to betested while being experimentally practical. The number of geometricobjections generated in silico is about 10⁵, 2×10⁵, 3×10⁵, 4×10⁵, 5×10⁵,6×10⁵, 7×10⁵, 8×10⁵, 9×10⁵, 10⁶, 10⁷, or more than 10⁷.

In some embodiments, the object generation approach is automated toattain maximum spatial coverage of the right size order of the overhangsin the fewest possible objects, limiting redundancy of spatialcoordination. In some embodiments, a wide diversity of objects is usedto ensure maximal coverage across the space of possibilities, such thatthe final experimental library has near complete spatial coverage.

In preferred embodiments, automated liquid handlers are used forgenerating these structure mixes. Typically, three high-throughputliquid dispensing steps are used for library generation, involvingdispensing of the nucleic acid scaffold, the core staples, and thefunctionalized staple sequences into designated wells of any suitablemulti-well plates.

Generally, automation is preferred for the nanostructure librarygeneration. Using synthesized stocks of staples, in combination withautomated liquid handling and a liquid dispenser such as Echo 555nanofluidic dispenser, high-throughput combinatorial libraries ofstaples with scaffold are readily generated. Typically, for eachstructure, there are a scaffold strand and a set of core (i.e.non-functionalized) staples. First, the scaffold and core staples aredispensed to every well of any suitable multi-well plates. Anynano-droplet dispensers having the ability to rapidly dispense 0.5 nL to100 nL from a source well to a destination well, can be used. Inpreferred embodiments, an Echo 555 nano-droplet dispenser is used, withthe ability to rapidly dispense 2.5 nL from a source well to adestination well.

In some embodiments, the source well contains functionalizedoligonucleotide staples at a concentration at about 100 nM, 200 nM, 300nM, 400 nM, 500 nM, 600 nM, 700 nM, 800 nM, 900 nM, 1 mM, or more than 1mM. For example, in certain embodiments, 2.5 nL of functionalizedoligonucleotide staples is transferred from a source well containingfunctionalized oligonucleotide staples at a concentration at more than 1mM. For example, the Echo 555 nano-droplet dispenser is capable oftransferring up to 60 droplets per second at a volume of about 2.5 nL.

Using the nano-droplet dispenser system, multiple 384-well plates ofdistributions of objects are readily generated from a source plate offunctionalized staple strands that cover the geometric space allowableby DNA origami objects. The methodology is not limited to 384-wellplates, any suitable plates that are compatible with high-throughputcapability can be used, for example, 96-well plates, 384-well plates,and 1536-well plates. In some embodiments, the concentrations and volumerequirement of the nucleic acid scaffold, the core staples, and thefunctionalized staple strands are taken into consideration when decidingon the plate format.

In some embodiments, the nucleic acid scaffold, the core staples, andthe functionalized staple strands are mixed and annealed by slowlychanging the temperature down (annealing) over the course of 1 to 48hours. This process allows the staple strands to guide the folding ofthe scaffold into the final DNA nanostructures. In further embodiments,high-throughput thermocyclers are used to slowly anneal staples andscaffold to generate the target nanostructure library, resulting in six,seven, eight, nine, ten, or more than ten 384-well plates of objects,with maximized utility generated from the computational method. Inpreferred embodiments, more than ten 384-well plates of nanostructuresare generated.

The high-throughput methods allow fast generation so any number ofnanostructures is capable of being generated as desired for the library,for example, one thousand, two thousand, three thousand, four thousand,five thousand, six thousand, seven thousand, eight thousand, ninethousand, ten thousand, twenty thousand, thirty thousand, fortythousand, fifty thousand, one hundred thousand, one million, and morethan one million nanostructures for assembly. In preferred embodiments,combinatorial libraries of objects with any geometry, size, sequence,and nick placement, allowing for one million, or more than one billionof spatial overhang possibilities.

In some embodiments, liquid handling automation is used to generateapproximately 3,000 of these space-covering geometric objects to testagainst target RNAs. In some embodiments, structural features andthermal stability of these target RNAs are characterized. In furtherembodiments, detection assays of nanostructure folding and stabilityusing quantitative PCR, high-throughput fast analysis gels, anddigestion analysis are used for assessing the DNA nanostructures as wellas complexes with RNAs.

In some embodiments, the generated objects have designed staples with 3′or 5′ single-stranded DNA overhangs distributed over the edges of thewireframe polyhedra with singular bait sequence occurrences per objectdesign per well. In further embodiments, these bait sequences are testedfor complementarity within the structure to reduce misfolding.

In some embodiments, the development of chip and single-welltechnologies in DNA synthesis of oligonucleotides allows for assembly ofnanoscale objects having pools of different sets of staples in each wellgrown in, for example, a 384-well plate. In each case, purificationtechniques applied to single structures are applicable to thishigh-throughput system, typically via filtration and buffer exchange. Infurther embodiments, high-throughput, rapid-run gel based assays,selective cryo-EM structural studies, and quantitative PCR (qPCR)temperature melting analysis are used for structural analysis, andvalidation. Additionally, fluorimetric or colorimetric read-outs arefeasible using strand-displacement reaction cascades or triggeredamplification upon RNA complexing. In some embodiments,structure-specific bar-codes or affinity capture tags are includedwithin the scaffold or staple sequences. These tags or codes are used torecord and identify desired characteristics, or to select specificnanostructures, or molecules complexed with the nanostructures.

B. Graphical User Interface

In a preferred set of embodiments the computer server receives inputsubmitted through a graphical user interface (GUI). The GUI may bepresented on an attached monitor or display and may accept input througha touch screen, attached mouse or pointing device, or from an attachedkeyboard. In some embodiments the GUI will be communicated across anetwork using an accepted standard to be rendered on a monitor ordisplay attached to a client computer and capable of accepting inputfrom one or more input devices attached to the client computer. In otherembodiments, a phone interface can identify, read and or run enteredsequences.

In the exemplary embodiment, the GUI contains a target structureselection region where the user selects the parameters to be input. Inthis exemplary system a target structure is indicated by clicking,touching, highlighting or selecting one of the structures, or subsets ofstructures, that are listed. In preferred embodiments, the targetstructure is selected from a drop-down list. In some embodiments, theoverall target structure is selected and then customized to includeuser-defined features. Customization may include drawing a model, suchas a wireframe model, using any computer programs capable of suchfunctions. Other parameters relating to the target structure, such asedge length, molecular weight, overall size, encapsulation volume,wire-frame model topology, etc.

In some embodiments, the GUI enables entering or uploading one or moretemplate or guide sequences, such as nucleic acid sequences. Forexample, the GUI typically includes a text box for the user to input ofone or more parameters. In other embodiments, users may input anysequence or sequences for which they would like to design stapleprimers. The GUI may additionally or alternatively contain an interfacefor uploading a text file containing one or more query structures and/orsequences.

In a particular embodiment, a text file contains the geometricparameters of a target shape provided in a standard polyhedral fileformat. The geometric parameters of any closed, orientable surfacenetwork can serve as input using any file format that specifiespolygonal geometry known in the art, including but not limited to,Polygon File Format (PLY), Stereolithography (STL), or Virtual RealityModeling Language (WRL). When a standard polyhedral file format isprovided, the code includes a parser to convert the standard polyhedralfiles into the required inputs.

In embodiments that include both options, the GUI may also contain radiobuttons that allow the user to select if the target sequence will beentered in a text box or uploaded from a text file. The GUI may includea button for choosing the file, may allow a user to drag and drop theintended file, or other means of having the file uploaded. Any of theparameters can be entered by hand to further customize.

The GUI also typically includes an interface for the user to initiatethe methods based on the input model and/or other parameters. Theexemplary GUI embodiment includes a submit button or tab that whenselected initiates a search according to the user entered or defaultcriteria. The GUI can also include a reset button or tab when selectedremoves that user input and/or restores the default settings.

The GUI will in some embodiments have an example button that, whenselected by the user, populates all of the input fields with defaultvalues. The option selected by the example values may in someembodiments coincide with an example described in detail in a tutorial,manual, or help section. The GUI will in some embodiments contain all oronly some of the elements described above. The GUI may contain anygraphical user input element or combination thereof including one ormore menu bars, text boxes, buttons, hyperlinks, drop-down lists, listboxes, combo boxes, check boxes, radio buttons, cycle buttons, datagrids, or tabs.

IV. Nucleic Acid Nanostructures

Nucleic acid nanostructures, designed according to the geometricparameters of a desired polyhedral shape, according to the methods fortop-down design of polyhedral nucleic acid assemblies are described. Thepolyhedral nucleic acid assemblies include a single-stranded nucleicacid scaffold sequence that is routed throughout the entire structure.The polyhedral nucleic acid assemblies optionally includeoligonucleotide staple strands that hybridize to the scaffold sequenceand create the polyhedral structure. When the polyhedral nucleic acidassemblies do not include staple strands, the scaffold sequencehybridizes to itself to create the polyhedral structure. The nucleicacids nanostructures designed according to the described methods includetwo or more nucleic acid duplexes per edge, and incorporate at least oneparallel or anti-parallel crossover motif within at least one edge.

Modified nucleic acid nanostructures are also described. The nucleicacid nanostructures designed and assembled according to the describedmethods can include one or more modified nucleic acids, such asnon-naturally occurring nucleic acids, derivatives and analogs. In someembodiments, the polyhedral structures are modified nucleic acidnanostructures that include one or more non-nucleic acid molecules. Inother embodiments, the polyhedral structures are modified to include oneor more nucleic acid sequences that are capable of binding or otherwiseinteracting with one or more non-nucleic acid molecules.

A. Nanostructure Assemblies Produced by Top-Down Design

Nucleic acid nanostructures having polyhedral morphology designed andproduced according to the described top-down design methods aredescribed. The polyhedral nucleic acid nanostructures include a singlestranded nucleic acid scaffold routed through the entire polyhedralstructure.

The nucleic acid nanostructures can be of any desired shape that can berendered as a three-dimensional wire-frame mesh with sharp angles andnon-curved edges. The nucleic acid nanostructures include asingle-stranded nucleic acid scaffold that is routed throughout theentire structure. The route of the single-stranded nucleic acid scaffoldthroughout every face of the structure is the Eulerian circuit throughthe node-edge network of the planar graph of the structure. Preferably,the Eulerian circuit that defines the path of the single-strandedscaffold sequence throughout the entire structure is the A-trailEulerian circuit.

In some embodiments, the nanostructures include at least one edge havinga DX crossover motif located within the center of the edge. In otherembodiments, the nanostructures include at least one edge having a PXcrossover motif located within the center of the edge. Typically, thenanostructures include zero or one scaffold crossover structures peredge. The placement of DX scaffold cross-overs is defined using by themaximum-breadth spanning-tree of the node-edge network of the planargraph of the structure. Edges that form part of the maximum-breadthspanning tree are the only edges that do not include a DX scaffoldcrossover. Edges that form part of the maximum-breadth spanning tree arethe only edges that include a single DX scaffold crossover.

Nucleic acid nanostructures produced according to the methods includetwo nucleic acid anti-parallel helices along each edge to strengthen therigidity of the structure.

The nucleic acid nanostructures are typically less than 1 micron indiameter, for example, 10 nm-1,000 nm, inclusive. In some embodiments,the nucleic acid nanostructures have overall dimensions of 50-500 nm,60-200 nm, or 60-100 nm, for example, 60 nm, 70 nm, 80 nm, 90 nm, 100 nmor leger than 100 nm. The molecular weight of the nanostructure istypically defined by the size and complexity of the polyhedral shape ofthe nanostructure. Typically, the nucleic acid nanostructures have amolecular weight of between 200 kilo daltons (kDa) and 1 mega dalton (1mDa). The volume encapsulated by the nanostructures is defined by thesize and shape of the nanostructures, and can be determined from thedimensions.

Typically the nucleic acid nanostructures are stable in physiologicalconcentrations of salt, for example, in PBS, and DMEM.

1. Modified Nucleotides

In some embodiments, the nucleotides of the scaffolded DNA sequences aremodified. For example, in some embodiments, one or more of thenucleotides of the DNA staple sequences are modified, or one or more ofthe nucleotides of scaffold sequence are modified, or both nucleotidesin the DNA staple sequences and nucleotides in the scaffold sequence aremodified.

When modified nucleotides are incorporated into nucleic acid scaffoldstrands or oligonucleotide staple strands, the modified nucleotides canbe incorporated as a percentage or ratio of the total nucleotides usedin the preparation of the nucleic acids. In some embodiments, themodified nucleotides represent 0.1% or more than 0.1% of the totalnumber of nucleotides in the sequence, up to or approaching 100% of thetotal nucleotides present. For example, the relative amount of modifiednucleotides can be between 0.1% and 100% inclusive, such as 0.1%-0.5%,1%-2%, 1%-5%, 1%-10%, 10%-20%, 20%-30%, 30%-40%, 40%-50%, or more than50% of the total, up to and including 100%, such as 60%, 70%, 75%, 80%,85%, 90%, 95% or 99% of the total. In a certain embodiment, a sequenceof nucleic acids includes a single modified nucleotide, or two, or threemodified nucleotides. In some embodiments, nucleic acid nanostructurescontain one, or more than one, up to 100 modified nucleotides in everyedge. In other embodiments, the number of modified nucleotidescorrelates with the size of the nanostructure, or the shape, or thenumber of faces or edges, or vertices of the nanostructure. For example,in some embodiments, nucleic acid nanostructures include the same ordifferent numbers of modified nucleotides within every edge or vertex.In some embodiments, the modified nucleotides are present at theequivalent position in every structurally-equivalent edge of thenanostructure. In some embodiments, nucleic acid nanostructures includemodified nucleotides at precise locations and in specific numbers orproportions as determined by the design process. Therefore, in someembodiments, nucleic acid assemblies include a defined number orpercentage of modified nucleotides at specified positions within thestructure. In some embodiments, nucleic acid nanostructures producedaccording to the described methods include more than a single type ofmodified nucleic acid. In exemplary embodiments, nucleic acidnanostructures include one type of modified nucleic acid on every edge,or mixtures of two or more different modified nucleic acids on everyedge. Therefore, when a single type of modified nucleic acid is presentat an edge of the structure, each edge can include a different type ofmodification relative to every other edge.

Examples of modified nucleotides that can be included within thedescribed nanostructures include, but are not limited to, diaminopurine,S²T, 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil,hypoxanthine, xantine, 4-acetylcytosine,5-(carboxyhydroxylmethyl)uracil,5-carboxymethylaminomethyl-2-thiouridine,5-carboxymethylaminomethyluracil, dihydrouracil,beta-D-galactosylqueosine, inosine, N6-isopentenyladenine,1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine,2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine,7-methylguanine, 5-methylaminomethyluracil,5-methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine,5′-methoxycarboxymethyluracil, 5-methoxyuracil,2-methylthio-D46-isopentenyladenine, uracil-5-oxyacetic acid (v),wybutoxosine, pseudouracil, queosine, 2-thiocytosine,5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil,uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid (v),5-methyl-2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w,and 2,6-diaminopurine. Nucleic acid molecules may also be modified atthe base moiety (e.g., at one or more atoms that typically are availableto form a hydrogen bond with a complementary nucleotide and/or at one ormore atoms that are not typically capable of forming a hydrogen bondwith a complementary nucleotide), sugar moiety or phosphate backbone.Nucleic acid molecules may also contain amine-modified groups, such asaminoallyl-dUTP (aa-dUTP) and aminohexhylacrylamide-dCTP (aha-dCTP) toallow covalent attachment of amine reactive moieties, such as N-hydroxysuccinimide esters (NHS).

In some embodiments phosphorothioate modified backbone on the DNAnucleotide staples or on the scaffold is used to improve stability ofthe DNA nanostructures to degradation by exonuclease. For example, insome embodiments the nucleic acid nanostructures include modifiednucleic acids that protect one or more regions of the nanostructure fromenzymic degradation or disruption in vivo. In some embodiments, nucleicacid nanostructures include modified nucleic acids at specific locationswithin the structure that direct the timing of the enzymic degradationof specific parts of the structure. For example, modifications can bedesigned to prevent degradation, or to enhance the likelihood ofdegradation of one or more edges before or after different edges withinthe same structure. In this way, modifications that enhance or reduceprotection or enzymic degradation of one or more parts of ananostructure in vivo can drive or facilitate structural changes in thestructure, for example, for example to enhance or alter the half-life ofa given structure in vivo.

Locked nucleic acid (LNA) is a family of conformationally lockednucleotide analogues which, amongst other benefits, imposes trulyunprecedented affinity and very high nuclease resistance to DNA and RNAoligonucleotides (Wahlestedt C, et al., Proc. Natl Acad. Sci. USA,975633-5638 (2000); Braasch, D A, et al., Chem. Biol. 81-7 (2001);Kurreck J, et al., Nucleic Acids Res. 301911-1918 (2002)). In someembodiments, the nucleic acids are synthetic RNA-like high affinitynucleotide analogue, locked nucleic acids. In some embodiments, thescaffolded DNAs are locked nucleic acids. In other embodiments, thestaple strands are locked nucleic acids.

Peptide nucleic acid (PNA) is a nucleic acid analog in which the sugarphosphate backbone of natural nucleic acid has been replaced by asynthetic peptide backbone usually formed from N-(2-amino-ethyl)-glycineunits, resulting in an achiral and uncharged mimic (Nielsen P E et al.,Science 254, 1497-1500 (1991)). It is chemically stable and resistant tohydrolytic (enzymatic) cleavage. In some embodiments, the scaffoldedDNAs are PNAs. In other embodiments, the staple strands are PNAs.

In some embodiments PNAs, DNAs, RNAs, or LNAs are used for capture, orproteins or other small molecules of interest to target, or otherwiseinteract with complementary binding sites on structured RNAs, or DNAs.In other embodiments, a combination of PNAs, DNAs, RNAs and/or LNAs isused in the formation of structured nucleic acid nanostructures.

In some embodiments, the structured nanostructures include a combinationof PNAs, DNAs, and/or LNAs. In some embodiments, a combination of PNAs,DNAs, and/or LNAs is used for the staple strands.

In some embodiments, the nucleic acids produced according to thedescribed methods are modified to incorporate fluorescent molecules.Exemplary fluorescent molecules include fluorescent dyes and stains,such as Cy5 modified CTP.

In some embodiments, nucleic acid nanostructures include one or morenucleic acids conjugated to polymers. Exemplary polymers that can beconjugated to nucleic acids include biodegradable polymers,non-biodegradeable polymers, cationic polymers and dendrimers. Forexample, a non-limiting list of polymers that can be coupled to nucleicacids within the nucleic acid nanostructures includes poly(beta-aminoesters); aliphatic polyesters; polyphosphoesters; poly(L-lysine)containing disulfide linkages; poly(ethylenimine) (PEI);disulfide-containing polymers such as DTSP or DTBP crosslinked PEI;PEGylated PEI crosslinked with DTSP; Crosslinked PEI with DSP; LinearSS-PEI; DTSP-Crosslinked linear PEI; branched poly(ethylenimine sulfide)(b-PEIS). Typically, the polymer has a molecular weight of between 500Da and 20,000 Da, inclusive, for example, approximately 1,000 Da to10,000 Da, inclusive. In some embodiments, the polymer is ethyleneglycol. In some embodiments, the polymer is polyethylene glycol. In anexemplary embodiment, one or more polymer are conjugated to the nucleicacids within one or more of the staples. Therefore, in some embodiments,one or more types of polymers conjugated to staple strands are used tocoat the nucleic acid nanostructure with the one or more polymers. Insome embodiments, one or more types of polymers conjugated to nucleicacids in the scaffold sequence are used to coat the used to coat the DNAnucleic acid nanostructure with the one or more polymers.

2. Modified Nanostructures

Nucleic acid nanostructures designed and produced according to thedescribed methods can be modified to include nucleic acids having aknown function, or molecules other than nucleic acids. Exemplaryadditional elements include small molecules, proteins, peptides, nucleicacids, lipids, saccharides, or polysaccharides. For example, nucleicacid nanoparticles can be modified to include proteins or RNAs having aknown function, such as antibodies or RNA aptamers having an affinity toone or more target molecules. Therefore, the nucleic acid nanostructuresdesigned and produced according to the described methods can befunctionalized nucleic acid nanostructures.

Nucleic acid nanostructures can include one or more functional moleculesat one or more locations on or within the structure. In someembodiments, the functional group is located at one or more staplestrands. In other embodiments, the functional moiety is located directlywithin the scaffold sequence of the nanostructure. In other embodiments,nanostructures include one or more functional moieties located withinthe scaffold sequence and within one or more staple sequences. Whennanostructures include two or more functional moieties, the functionalmoieties can be the same, or different.

a. Interaction with Functional Molecules

Typically, nucleic acid nanostructures are modified by chemical orphysical association with one or more functional molecules. Exemplarymethods of conjugation include covalent or non-covalent linkages betweenthe nanostructure and the functional molecule. In some embodiments,conjugation with functional molecules is through click-chemistry. Insome embodiments, conjugation with functional molecules is throughhybridization with one or more of the nucleic acid sequences present onthe nanostructure. In some embodiments, conjugation with functionalmolecules is through click-chemistry.

i. Modified Staple Sequences

In some embodiments, nucleic acid nanostructures include one or morefunctional groups located at one or more staple strands. For example, insome embodiments, the nucleic acid nanostructures include modifiedstaple strands include single-stranded overhang sequences. In someembodiments, the overhang sequences are between 4 and 60 nucleotides. Inpreferred embodiments, the overhang sequences are between 4 and 25nucleotides. In some embodiments, the overhang sequences contain 4, 5,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,25, 30, 35, 40, 45, 50 nucleotides in length.

In some embodiments, nanostructures include oligonucleotide staplesextended at either the 5′ or 3′ ends by an unpaired region of nucleicacid, such as DNA, RNA, PNA, or LNA of known sequence. For example, insome embodiments the single-stranded nucleic acid includes a bindingsite for one or more functional moieties, such as nucleic acids,proteins or small molecules. Therefore, nucleic acid nanoparticlesincluding staple strands extended to include one or more single-strandednucleic acid binding sites for a functional nucleic acid, protein orsmall molecule are described. Nucleic acid nanoparticles includingfunctional RNA, small molecules, or proteins are also described. Thefunctionalized nanoparticles can include functional moieties displayedat the surface of the nanoparticle, or located within the inner volumeof the nanoparticle. Typically, the location of the functional moiety isdetermined by the desired biological function of the nanoparticle.

Nucleic acid nanoparticles functionalized with one or more nucleic acidor non-nucleic acid moieties having a known biological function areprovided.

In some embodiments, nucleic acid nanoparticles include staple strandsextended to include one or more single-stranded nucleic acid sequencesthat are complementary to the loop region of an RNA, such as an mRNA.Loop regions of mRNA targets can be identified using methods known inthe art. When sequences complementary to these loop regions are appendedto one or more nanoparticle staple strands, the nanoparticle is capableof capturing the target RNA. Nanoparticles specifically bound to targetRNA can be identified from those that are not bound to the target RNAusing any assay known in the art, such as by gel mobility shift, and/orimaging by cryo-EM.

ii. Modified Scaffold Sequences

In some embodiments, nucleic acid nanostructures include asingle-stranded scaffold nucleic acid sequence that is modified toinclude one or more sequences of nucleic acids that bind one or morefunctional moieties, such as nucleic acids, proteins or small molecules.In some embodiments, the scaffold includes an overhang sequence thatincludes one or more functionalizing sequences or moieties at the 5′ or3′ ends. In other embodiments, the scaffold includes an internalfunctionalizing sequence or moiety, for example, within one or morenucleic acids that form part of an edge of the nanostructure.

iii. Encapsulation or Structural Enclosure

In some embodiments, nucleic acid nanostructures are designed to have ashape or three dimensional form that encloses a volume suitable tocontain one or more functional molecules. For example, in someembodiments, the nanostructures are designed to have the shape of a cup,box, vase or other open structure enclosing a volume, into which one ormore functional molecules can be loaded or inserted. In someembodiments, insertion or loading of functional molecules to within theinner space of the nucleic acid nanostructure is directed through thepresence of capture tags within or near the interior space of ethstructure. In some embodiments, functional molecules that are locatewithin the inner space of the structure are maintained within thestructure by the addition of one or more additional molecules, forexample, to “block” or otherwise sterically prevent the release of thecontained molecule. Therefore, in some embodiments, nucleic acidnanostructures are designed to include a “lid” or other structurednucleic acid form that encapsulates a loaded or “captured” functionalmolecule with in the inner-space of the nanostructure. In someembodiments the access to the inner space of nucleic acid nanostructuresis mediated by a structural or conformational change in the structure.Therefore, in some embodiments, the encapsulation of a functionalmolecule and/or release of the functional molecule from the inner spaceis controlled by one or more external factors that induce aconformational change in the nanostructure.

b. Functional Molecules

Nucleic acid nanostructures including nucleic acid overhang sequencescan capture one or more functional moieties, including but not limitedto single-guide- or crispr-RNAs (crRNA), anti-sense DNA, anti-sense RNAas well as DNA coding for proteins, mRNA, miRNA, piRNA and siRNA,DNA-interacting proteins such as CRISPR, TAL effector proteins, orzinc-finger proteins, lipids, carbohydrates. In other embodiments,nucleic acid nanoparticles are modified with naturally or non-naturallyoccurring nucleotides having a known biological function. Exemplaryfunctional groups include targeting elements, immunomodulatory elements,chemical groups, biological macromolecules, and combinations thereof.

In some embodiments, functionalized nucleic acid nanostructures includeone or more single-strand overhang or scaffold DNA sequences that arecomplementary to the loop region of an RNA, such as an mRNA. Nucleicacid nanoparticles functionalized with mRNAs encoding one or moreproteins are described. In one exemplary case, a tetrahedron (but couldbe any other object that can be designed from the procedure) can befunctionalized with 3 (or 1 or 2 or more than 3) single-strand overhangDNA sequences that are complementary to the loop region of an RNA, forexample an mRNA, for example an mRNA expressing a protein.

i. Targeting Elements

Targeting elements can be added to the staple strands of the DNAnanostructures, to enhance targeting of the nanostructures to one ormore cells, tissues or to mediate specific binding to a protein, lipid,polysaccharide, nucleic acid, etc. For example, for use as biosensors,additional nucleotide sequences are included as overhang sequences onthe staple strands.

Exemplary targeting elements include proteins, peptides, nucleic acids,lipids, saccharides, or polysaccharides that bind to one or more targetsassociated with an organ, tissue, cell, or extracellular matrix, orspecific type of tumor or infected cell. The degree of specificity withwhich the nucleic acid nanostructures are targeted can be modulatedthrough the selection of a targeting molecule with the appropriateaffinity and specificity. For example, antibodies, or antigen-bindingfragments thereof are very specific.

Typically, the targeting moieties exploit the surface-markers specificto a biologically functional class of cells, such as antigen presentingcells. Dendritic cells express a number of cell surface receptors thatcan mediate endocytosis. In some embodiments, overhang sequences includenucleotide sequences that are complementary to nucleotide sequences ofinterest, for example HIV-1 RNA viral genome.

Additional functional groups can be introduced on the staple strand forexample by incorporating biotinylated nucleotide into the staple strand.Any streptavidin-coated targeting molecules are therefore introduced viabiotin-streptavidin interaction. In other embodiments, non-naturallyoccurring nucleotides are included for desired functional groups forfurther modification. Exemplary functional groups include targetingelements, immunomodulatory elements, chemical groups, biologicalmacromolecules, and combinations thereof.

Typically, the targeting moieties exploit the surface-markers specificto a group of cells to be targeted. Exemplary targeting elements includeproteins, peptides, nucleic acids, lipids, saccharides, orpolysaccharides that bind to one or more targets associated with cell,or extracellular matrix, or specific type of tumor or infected cell. Thedegree of specificity with which the delivery vehicles are targeted canbe modulated through the selection of a targeting molecule with theappropriate affinity and specificity. For example, antibodies, orantigen-binding fragments thereof are very specific.

(a) Antibodies

In some embodiments, nucleic acid nanostructures are modified to includeone or more antibodies. Antibodies that function by binding directly toone or more epitopes, other ligands, or accessory molecules at thesurface of cells can be coupled directly or indirectly to thenanostructures. In some embodiments, the antibody or antigen bindingfragment thereof has affinity for a receptor at the surface of aspecific cell type, such as a receptor expressed at the surface ofmacrophage cells, dendritic cells, or epithelial lining cells. In someembodiments the antibody binds one or more target receptors at thesurface of a cell that enables, enhances or otherwise mediates cellularuptake of the antibody-bound nanostructure, or intracellulartranslocation of the antibody-bound nanostructure, or both.

Any specific antibody can be used to modify the nucleic acidnanostructures. For example, antibodies can include an antigen bindingsite that binds to an epitope on the target cell. Binding of an antibodyto a “target” cell can enhance or induce uptake of the associatednucleic acid nanostructures by the target cell protein via one or moredistinct mechanisms.

In some embodiments, the antibody or antigen binding fragment bindsspecifically to an epitope. The epitope can be a linear epitope. Theepitope can be specific to one cell type or can be expressed by multipledifferent cell types. In other embodiments, the antibody or antigenbinding fragment thereof can bind a conformational epitope that includesa 3-D surface feature, shape, or tertiary structure at the surface ofthe target cell.

In some embodiments, the antibody or antigen binding fragment that bindsspecifically to an epitope on the target cell can only bind if theprotein epitope is not bound by a ligand or small molecule.

Various types of antibodies and antibody fragments can be used to modifynucleic acid nanostructures, including whole immunoglobulin of anyclass, fragments thereof, and synthetic proteins containing at least theantigen binding variable domain of an antibody. The antibody can be anIgG antibody, such as IgG1, IgG2, IgG3, or IgG4 subtyes. An antibody canbe in the form of an antigen binding fragment including a Fab fragment,F(ab′)2 fragment, a single chain variable region, and the like.Antibodies can be polyclonal, or monoclonal (mAb). Monoclonal antibodiesinclude “chimeric” antibodies in which a portion of the heavy and/orlight chain is identical with or homologous to corresponding sequencesin antibodies derived from a particular species or belonging to aparticular antibody class or subclass, while the remainder of thechain(s) is identical with or homologous to corresponding sequences inantibodies derived from another species or belonging to another antibodyclass or subclass, as well as fragments of such antibodies, so long asthey specifically bind the target antigen and/or exhibit the desiredbiological activity (U.S. Pat. No. 4,816,567; and Morrison, et al.,Proc. Natl. Acad. Sci. USA, 81: 6851-6855 (1984)). The antibodies canalso be modified by recombinant means, for example by deletions,additions or substitutions of amino acids, to increase efficacy of theantibody in mediating the desired function. Substitutions can beconservative substitutions. For example, at least one amino acid in theconstant region of the antibody can be replaced with a different residue(see, e.g., U.S. Pat. Nos. 5,624,821; 6,194,551; WO 9958572; and Angal,et al., Mol. Immunol. 30:105-08 (1993)). In some cases changes are madeto reduce undesired activities, e.g., complement-dependent cytotoxicity.The antibody can be a bi-specific antibody having binding specificitiesfor at least two different antigenic epitopes. In one embodiment, theepitopes are from the same antigen. In another embodiment, the epitopesare from two different antigens. Bi-specific antibodies can includebi-specific antibody fragments (see, e.g., Hollinger, et al., Proc.Natl. Acad. Sci. USA., 90:6444-48 (1993); Gruber, et al., J. Immunol.,152:5368 (1994)).

Antibodies that target the nucleic acid nanostructures to a specificepitope can be generated by any means known in the art. Exemplarydescriptions means for antibody generation and production includeDelves, Antibody Production: Essential Techniques (Wiley, 1997);Shephard, et al., Monoclonal Antibodies (Oxford University Press, 2000);Goding, Monoclonal Antibodies: Principles And Practice (Academic Press,1993); and Current Protocols In Immunology (John Wiley & Sons, mostrecent edition). Fragments of intact Ig molecules can be generated usingmethods well known in the art, including enzymatic digestion andrecombinant means.

(b) Capture Tags

In some embodiments, nanostructures include one or more sequences ofnucleic acids that act as capture tags, or “Bait” sequences tospecifically bind one or more targeted molecules. For example, in someembodiments, overhang sequences include nucleotide “bait” sequences thatare complementary to any target nucleotide sequence, for example HIV-1RNA viral genome. In further embodiments, functional groups are presenton one or more staple strands to act as capture tags. For example, insome embodiments, one or more biotinylated nucleotides are incorporatedinto the staple strand. Streptavidin-coated molecules are thereforeintroduced via biotin-streptavidin interaction.

Typically, targeting moieties exploit the surface-markers specific to agroup of cells to be targeted. Exemplary targeting elements includeproteins, peptides, nucleic acids, lipids, saccharides, orpolysaccharides that bind to one or more targets associated with cell,or extracellular matrix, or specific type of tumor or infected cell.Targeting molecules can be selected based on the desired physicalproperties, such as the appropriate affinity and specificity for thetarget. Exemplary targeting molecules having high specificity andaffinity include antibodies, or antigen-binding fragments thereof.Therefore, in some embodiments, nucleic acid nanostructures include oneor more antibodies or antigen binding fragments specific to an epitope.The epitope can be a linear epitope. The epitope can be specific to onecell type or can be expressed by multiple different cell types. In otherembodiments, the antibody or antigen binding fragment thereof can bind aconformational epitope that includes a 3-D surface feature, shape, ortertiary structure at the surface of the target cell.

ii. Functional Nucleic Acids

In some embodiments, the nucleic acid nanostructures include one or morefunctional nucleic acids. Functional nucleic acids that inhibit thetranscription, translation or function of a target gene are described.

Functional nucleic acids are nucleic acid molecules that have a specificfunction, such as binding a target molecule or catalyzing a specificreaction. As discussed in more detail below, functional nucleic acidmolecules can be divided into the following non-limiting categories:antisense molecules, siRNA, miRNA, aptamers, ribozymes, triplex formingmolecules, RNAi, and external guide sequences. The functional nucleicacid molecules can act as effectors, inhibitors, modulators, andstimulators of a specific activity possessed by a target molecule, orthe functional nucleic acid molecules can possess a de novo activityindependent of any other molecules.

Functional nucleic acid molecules can interact with any macromolecule,such as DNA, RNA, polypeptides, or carbohydrate chains. Thus, functionalnucleic acids can interact with the mRNA or the genomic DNA of a targetpolypeptide or they can interact with the target polypeptide itself.Functional nucleic acids are often designed to interact with othernucleic acids based on sequence homology between the target molecule andthe functional nucleic acid molecule. In other situations, the specificrecognition between the functional nucleic acid molecule and the targetmolecule is not based on sequence homology between the functionalnucleic acid molecule and the target molecule, but rather is based onthe formation of tertiary structure that allows specific recognition totake place. Therefore the compositions can include one or morefunctional nucleic acids designed to reduce expression or function of atarget protein.

Methods of making and using vectors for in vivo expression of thedescribed functional nucleic acids such as antisense oligonucleotides,siRNA, shRNA, miRNA, EGSs, ribozymes, and aptamers are known in the art.

(a) Antisense Molecules

The functional nucleic acids can be antisense molecules. Antisensemolecules are designed to interact with a target nucleic acid moleculethrough either canonical or non-canonical base pairing. The interactionof the antisense molecule and the target molecule is designed to promotethe destruction of the target molecule through, for example, RNAse Hmediated RNA-DNA hybrid degradation. Alternatively the antisensemolecule is designed to interrupt a processing function that normallywould take place on the target molecule, such as transcription orreplication. Antisense molecules can be designed based on the sequenceof the target molecule. There are numerous methods for optimization ofantisense efficiency by finding the most accessible regions of thetarget molecule. Exemplary methods include in vitro selectionexperiments and DNA modification studies using DMS and DEPC. It ispreferred that antisense molecules bind the target molecule with adissociation constant (Kd) less than or equal to 10⁻⁶, 10⁻⁸, 10⁻¹⁰, or10⁻¹².

(b) Aptamers

The functional nucleic acids can be aptamers. Aptamers are moleculesthat interact with a target molecule, preferably in a specific way.Typically aptamers are small nucleic acids ranging from 15-50 bases inlength that fold into defined secondary and tertiary structures, such asstem-loops or G-quartets. Aptamers can bind small molecules, such as ATPand theophiline, as well as large molecules, such as reversetranscriptase and thrombin. Aptamers can bind very tightly with Kd'sfrom the target molecule of less than 10⁻¹² M. It is preferred that theaptamers bind the target molecule with a Kd less than 10⁻⁶, 10⁻⁸, 10⁻¹⁰,or 10⁻¹². Aptamers can bind the target molecule with a very high degreeof specificity. For example, aptamers have been isolated that havegreater than a 10,000 fold difference in binding affinities between thetarget molecule and another molecule that differ at only a singleposition on the molecule. It is preferred that the aptamer have a Kdwith the target molecule at least 10, 100, 1000, 10,000, or 100,000 foldlower than the Kd with a background binding molecule. It is preferredwhen doing the comparison for a molecule such as a polypeptide, that thebackground molecule be a different polypeptide.

(c) Ribozymes

The functional nucleic acids can be ribozymes. Ribozymes are nucleicacid molecules that are capable of catalyzing a chemical reaction,either intra-molecularly or inter-molecularly. It is preferred that theribozymes catalyze intermolecular reactions. Different types ofribozymes that catalyze nuclease or nucleic acid polymerase-typereactions which are based on ribozymes found in natural systems, such ashammerhead ribozymes are described. Ribozymes that are not found innatural systems, but which have been engineered to catalyze specificreactions de novo are also described. Preferred ribozymes cleave RNA orDNA substrates, and more preferably cleave RNA substrates. Ribozymestypically cleave nucleic acid substrates through recognition and bindingof the target substrate with subsequent cleavage. This recognition isoften based mostly on canonical or non-canonical base pair interactions.This property makes ribozymes particularly good candidates for targetingspecific cleavage of nucleic acids because recognition of the targetsubstrate is based on the target substrates sequence.

(d) Triplex Forming Nucleotides

The functional nucleic acids can be triplex forming oligonucleotidemolecules. Triplex forming functional nucleic acid molecules aremolecules that can interact with either double-stranded orsingle-stranded nucleic acid. When triplex molecules interact with atarget region, a structure called a triplex is formed, in which threestrands of DNA are forming a complex, dependent on both Watson-Crick andHoogsteen base-pairing. Triplex molecules are preferred because they canbind target regions with high affinity and specificity. It is preferredthat the triplex forming molecules bind the target molecule with a Kdless than 10⁻⁶, 10⁻⁸, 10⁻¹⁰, or 10⁻¹².

(e) External Guide Sequences

The functional nucleic acids can be external guide sequences. Externalguide sequences (EGSs) are molecules that bind a target nucleic acidmolecule forming a complex, which is recognized by RNase P, which thencleaves the target molecule. EGSs can be designed to specifically targeta RNA molecule of choice. RNAse P aids in processing transfer RNA (tRNA)within a cell. Bacterial RNAse P can be recruited to cleave virtuallyany RNA sequence by using an EGS that causes the target RNA:EGS complexto mimic the natural tRNA substrate. Similarly, eukaryotic EGS/RNAseP-directed cleavage of RNA can be utilized to cleave desired targetswithin eukaryotic cells. Representative examples of how to make and useEGS molecules to facilitate cleavage of a variety of different targetmolecules are known in the art.

(f) RNA Interference

In some embodiments, the functional nucleic acids induce gene silencingthrough RNA interference (siRNA). Expression of a target gene can beeffectively silenced in a highly specific manner through RNAinterference.

Gene silencing was originally observed with the addition of doublestranded RNA (dsRNA) (Fire, et al. (1998) Nature, 391:806-11; Napoli, etal. (1990) Plant Cell 2:279-89; Hannon, (2002) Nature, 418:244-51). OncedsRNA enters a cell, it is cleaved by an RNase III-like enzyme calledDicer, into double stranded small interfering RNAs (siRNA) 21-23nucleotides in length that contain 2 nucleotide overhangs on the 3′ ends(Elbashir, et al., Genes Dev., 15:188-200 (2001); Bernstein, et al.,Nature, 409:363-6 (2001); Hammond, et al., Nature, 404:293-6 (2000);Nykanen, et al., Cell, 107:309-21 (2001); Martinez, et al., Cell,110:563-74 (2002)). The effect of iRNA or siRNA or their use is notlimited to any type of mechanism.

In one embodiment, a siRNA triggers the specific degradation ofhomologous RNA molecules, such as mRNAs, within the region of sequenceidentity between both the siRNA and the target RNA. Sequence specificgene silencing can be achieved in mammalian cells using synthetic, shortdouble-stranded RNAs that mimic the siRNAs produced by the enzyme dicer(Elbashir, et al., Nature, 411:494-498 (2001)) (Ui-Tei, et al., FEBSLett, 479:79-82 (2000)). siRNA can be chemically or in vitro-synthesizedor can be the result of short double-stranded hairpin-like RNAs (shRNAs)that are processed into siRNAs inside the cell. For example, WO 02/44321describes siRNAs capable of sequence-specific degradation of targetmRNAs when base-paired with 3′ overhanging ends, herein incorporated byreference for the method of making these siRNAs. Synthetic siRNAs aregenerally designed using algorithms and a conventional DNA/RNAsynthesizer. Suppliers include Ambion (Austin, Tex.), ChemGenes(Ashland, Mass.), Dharmacon (Lafayette, Colo.), Glen Research (Sterling,Va.), MWB Biotech (Esbersberg, Germany), Proligo (Boulder, Colo.), andQiagen (Vento, The Netherlands). siRNA can also be synthesized in vitrousing kits such as Ambion's SILENCER® siRNA Construction Kit.

Therefore, in some embodiments, the composition includes a vectorexpressing the siRNA. The production of siRNA from a vector is morecommonly done through the transcription of a short hairpin RNAse(shRNAs). Kits for the production of vectors including shRNA areavailable, such as, for example, Imgenex's GENESUPPRESSOR™ ConstructionKits and Invitrogen's BLOCK-IT™ inducible RNAi plasmid and lentivirusvectors. In some embodiments, the functional nucleic acid is siRNA,shRNA, or miRNA.

iii. Gene Editing Molecules

In certain embodiments, the nucleic acid nanostructures arefunctionalized to include gene editing moieties, or to includecomponents capable of binding to gene editing moieties. Exemplarygene-editing moieties that can be included within or bound to nucleicacid nanoparticles are CRISPR RNAs, for the gene editing through theCRISPR/Cas system.

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is anacronym for DNA loci that contain multiple, short, direct repetitions ofbase sequences. The prokaryotic CRISPR/Cas system has been adapted foruse as gene editing (silencing, enhancing or changing specific genes)for use in eukaryotes (see, for example, Cong, Science,15:339(6121):819-823 (2013) and Jinek, et al., Science, 337(6096):816-21(2012)). By transfecting a cell with the required elements including acas gene and specifically designed CRISPRs, the organism's genome can becut and modified at any desired location. Methods of preparingcompositions for use in genome editing using the CRISPR/Cas systems aredescribed in detail in WO 2013/176772 and WO 2014/018423, which arespecifically incorporated by reference herein in their entireties.

In general, “CRISPR system” refers collectively to transcripts and otherelements involved in the expression of or directing the activity ofCRISPR-associated (“Cas”) genes, including sequences encoding a Casgene, a tracr (trans-activating CRISPR) sequence (e.g., tracrRNA or anactive partial tracrRNA), a tracr-mate sequence (encompassing a “directrepeat” and a tracrRNA-processed partial direct repeat in the context ofan endogenous CRISPR system), a guide sequence (also referred to as a“spacer” in the context of an endogenous CRISPR system), or othersequences and transcripts from a CRISPR locus. One or more tracr matesequences operably linked to a guide sequence (e.g., directrepeat-spacer-direct repeat) can also be referred to as pre-crRNA(pre-CRISPR RNA) before processing or crRNA after processing by anuclease.

In some embodiments, a tracrRNA and crRNA are linked and form a chimericcrRNA-tracrRNA hybrid where a mature crRNA is fused to a partialtracrRNA via a synthetic stem loop to mimic the natural crRNA:tracrRNAduplex as described in Cong, Science, 15:339(6121):819-823 (2013) andJinek, et al., Science, 337(6096):816-21 (2012)). A single fusedcrRNA-tracrRNA construct can also be referred to as a guide RNA or gRNA(or single-guide RNA (sgRNA)). Within an sgRNA, the crRNA portion can beidentified as the “target sequence” and the tracrRNA is often referredto as the “scaffold.”

There are many resources available for helping practitioners determinesuitable target sites once a desired DNA target sequence is identified.For example, numerous public resources, including a bioinformaticallygenerated list of about 190,000 potential sgRNAs, targeting more than40% of human exons, are available to aid practitioners in selectingtarget sites and designing the associate sgRNA to affect a nick ordouble strand break at the site. See also, crispr.u-psud.fr/, a tooldesigned to help scientists find CRISPR targeting sites in a wide rangeof species and generate the appropriate crRNA sequences.

In some embodiments, one or more vectors driving expression of one ormore elements of a CRISPR system are introduced into a target cell suchthat expression of the elements of the CRISPR system direct formation ofa CRISPR complex at one or more target sites. While the specifics can bevaried in different engineered CRISPR systems, the overall methodologyis similar. A practitioner interested in using CRISPR technology totarget a DNA sequence (such as CTPS1) can insert a short DNA fragmentcontaining the target sequence into a guide RNA expression plasmid. ThesgRNA expression plasmid contains the target sequence (about 20nucleotides), a form of the tracrRNA sequence (the scaffold) as well asa suitable promoter and necessary elements for proper processing ineukaryotic cells. Such vectors are commercially available (see, forexample, Addgene). Many of the systems rely on custom, complementaryoligos that are annealed to form a double stranded DNA and then clonedinto the sgRNA expression plasmid. Co-expression of the sgRNA and theappropriate Cas enzyme from the same or separate plasmids in transfectedcells results in a single or double strand break (depending of theactivity of the Cas enzyme) at the desired target site.

In an exemplary embodiment, crRNA can be extended 3′ and CRISPR-Cpf1loaded with this crRNA can be used to capture this protein/RNA complex,as assayed by gel mobility shift and dual staining with a DNA-specificstain and a protein-specific stain.

In another embodiment, CRISPR-Cpf1 complexed with crRNA targeting asequence in the EGFP gene. The cross-beam was made to be a duplex thatcontains this specific sequence, but could be homologous to the targetsequence with 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23nucleotides (20 nucleotides in the example case). The CRISPR-Cpf1/crRNAcomplex was found to bind to the nanoparticle as assayed by gel mobilityshift and dual staining for DNA and protein material. Molecular modelsof nucleic acid tretrahedra conjugated with RNA (FIG. 16A), and proteins(FIGS. 16B-16C) represent each of three different schemes for usingnucleic acid nanostructures for the capture of other molecules. FIG. 16Ashows an open wireframe tetrahedron nanostructure (56) coupled to anmRNA (58) using single strand DNA overhangs extended from the staples atnick positions (60), with the sequence of the overhang complementary topredicted loops in the RNA structure. FIG. 16B depicts CRISPR enzymeCpf1 (62) with crRNA (64) can be captured onto a DNA nanoparticles (56)on a crossbeam built into the nanoparticle (66), which contains asequence targeted by the Cpf1/crisprRNA enzyme. FIG. 16C depicts CRISPRenzyme Cpf1 (62) with crRNA (64) can be captured onto a DNAnanoparticles (56) on an overhang sequences built into the nanoparticle(68), which contains a sequence complementary to a 3′ extension of thecrRNA.

iv. Zinc Finger Nucleases

In some embodiments, the nucleic acid nanostructures include a nucleicacid construct or constructs encoding a zinc finger nuclease (ZFN). ZFNsare typically fusion proteins that include a DNA-binding domain derivedfrom a zinc-finger protein linked to a cleavage domain.

The most common cleavage domain is the Type IIS enzyme FokI. FokIcatalyzes double-stranded cleavage of DNA, at 9 nucleotides from itsrecognition site on one strand and 13 nucleotides from its recognitionsite on the other. See, for example, U.S. Pat. Nos. 5,356,802; 5,436,150and 5,487,994; as well as Li et al. Proc., Natl. Acad. Sci. USA 89(1992):4275-4279; Li et al. Proc. Natl. Acad. Sci. USA, 90:2764-2768(1993); Kim et al. Proc. Natl. Acad. Sci. USA. 91:883-887 (1994a); Kimet al. J. Biol. Chem. 269:31,978-31,982 (1994b). One or more of theseenzymes (or enzymatically functional fragments thereof) can be used as asource of cleavage domains.

The DNA-binding domain, which can, in principle, be designed to targetany genomic location of interest, can be a tandem array of Cys₂His₂ zincfingers, each of which generally recognizes three to four nucleotides inthe target DNA sequence. The Cys₂His₂ domain has a general structure:Phe (sometimes Tyr)-Cys-(2 to 4 amino acids)-Cys-(3 amino acids)-Phe(sometimes Tyr)-(5 amino acids)-Leu-(2 amino acids)-His-(3 aminoacids)-His. By linking together multiple fingers (the number varies:three to six fingers have been used per monomer in published studies),ZFN pairs can be designed to bind to genomic sequences 18-36 nucleotideslong.

Engineering methods include, but are not limited to, rational design andvarious types of empirical selection methods. Rational design includes,for example, using databases including triplet (or quadruplet)nucleotide sequences and individual zinc finger amino acid sequences, inwhich each triplet or quadruplet nucleotide sequence is associated withone or more amino acid sequences of zinc fingers which bind theparticular triplet or quadruplet sequence. See, for example, U.S. Pat.Nos. 6,140,081; 6,453,242; 6,534,261; 6,610,512; 6,746,838; 6,866,997;7,067,617; U.S. Published Application Nos. 2002/0165356; 2004/0197892;2007/0154989; 2007/0213269; and International Patent ApplicationPublication Nos. WO 98/53059 and WO 2003/016496.

v. mRNA

In some embodiments, nucleic acid nanostructures are modified bycovalent or non-covalent association with an RNA that encodes one ormore polypeptides, such as a protein. Therefore, in some embodiments,nucleic acid nanostructures are modified to include one or moremessenger RNA molecules (mRNA). The messenger RNA can encode any proteinor polypeptide. For example, in some embodiments, nucleic acidnanostructures are modified to include one or more mRNAs, each encodingone or more proteins. In an exemplary embodiment, the mRNA encodes afluorescent protein or fluorophore. Exemplary fluorescent proteinsinclude mCherry, mPlum, mRaspberry, mStrawberry, tdTomato, GFP, EBFP,Azurite, T-Sapphire, Emerald, Topaz, Venus, mOrange, AsRed2, and J-Red.In some embodiments, nucleic acid nanostructures are modified to includeone or more messenger RNA molecules an RNA that encodes one or morepolypeptides, such as a protein that is an antigen.

vi. Antigens

In some embodiments, nucleic acid nanostructures are modified bycovalent or non-covalent association with an antigen. Exemplary antigensinclude B cell antigens and T cell antigens. B cell antigens can bepeptides, proteins, polysaccharides, saccharides, lipids, nucleic acids,small molecules (alone or with a hapten) or combinations thereof. T cellantigens are proteins or peptides. The antigen can be derived from avirus, bacterium, parasite, plant, protozoan, fungus, tissue ortransformed cell such as a cancer or leukemic cell and can be a wholecell or immunogenic component thereof, e.g., cell wall components ormolecular components thereof. Suitable antigens are known in the art andare available from commercial government and scientific sources. Theantigens may be purified or partially purified polypeptides derived fromtumors or viral or bacterial sources. The antigens can be recombinantpolypeptides produced by expressing DNA encoding the polypeptide antigenin a heterologous expression system. The antigens can be DNA encodingall or part of an antigenic protein. Antigens may be provided as singleantigens or may be provided in combination. Antigens may also beprovided as complex mixtures of polypeptides or nucleic acids. In someembodiments the antigen is a viral antigen. A viral antigen can beisolated from any virus. In some embodiments the antigen is a bacterialantigen. Bacterial antigens can originate from any bacteria. In someembodiments the antigen is a parasite antigen. In some embodiments theantigen is an allergen or environmental antigen. Exemplary allergens andenvironmental antigens, include but are not limited to, an antigenderived from naturally occurring allergens such as pollen allergens(tree-, herb, weed-, and grass pollen allergens), insect allergens(inhalant, saliva and venom allergens), animal hair and dandruffallergens, and food allergens. In some embodiments the antigen is atumor antigen. Exemplary tumor antigens include a tumor-associated ortumor-specific antigen.

vii. Therapeutic or Prophylactic Agents

In some embodiments, nucleic acid nanostructures are modified bycovalent or non-covalent association with a therapeutic agent, or aprophylactic agent, or a diagnostic agent. For example, one or moretherapeutic, prophylactic, or diagnostic agents can be associated withthe exterior of the nucleic acid nanoparticle, or packaged within theinterior space of the nucleic acid nanoparticle, according to the designof the particle and location of the capture tag or site of interactionwith the Therapeutic or prophylactic or diagnostic agent. A non-limitinglist of active agents that can be encapsulated within, or otherwiseassociated with the nucleic acid nanoparticle includes anti-infectives,immunomodifying agents, hormones, antioxidants, steroids,anti-proliferative agents and diagnostic agents. Therapeutic agents caninclude a drug or modified form of drug such as prodrugs and analogs.

Examples of agents include, but are not limited to, beta-lactamantibiotics (including penicillins such as ampicillin, cephalosporinsselected in turn from cefuroxime, cefaclor, cephalexin, cephydroxil andcepfodoxime proxetil); tetracycline antibiotics (doxycycline andminocycline); microlides antibiotics (azithromycin, erythromycin,rapamycin and clarithromycin); fluoroquinolones (ciprofloxacin,enrofloxacin, ofloxacin, gatifloxacin, levofloxacin) norfloxacin, anantioxidant drug includes N-acetylcysteine (NAC); anti-inflammatorydrugs, such as nonsteroidal drugs (e.g., indomethacin, aspirin,acetaminophen, diclofenac sodium and ibuprofen); steroidalanti-inflammatory drug (e.g., dexamethasone); antiproliferative agents(e.g., Paclitaxel (Taxol), QP-2 Vincristin, Methotrexat, Angiopeptin,Mitomycin, BCP 678, Antisense c-myc, ABT 578, Actinomycin-D, RestenASE,1-Chlor-deoxyadenosin, PCNA Ribozym, and Celecoxib) sirolimus,everolimus and ABT-578), paclitaxel and antineoplastic agents, includingalkylating agents (e.g., cyclophosphamide, mechlorethamine,chlorambucil, melphalan, carmustine, lomustine, ifosfamide,procarbazine, dacarbazine, temozolomide, altretamine, cisplatin,carboplatin and oxaliplatin), antitumor antibiotics (e.g., bleomycin,actinomycin D, mithramycin, mitomycin C, etoposide, teniposide,amsacrine, topotecan, irinotecan, doxorubicin, daunorubicin, idarubicin,epirubicin, mitoxantrone and mitoxantrone), antimetabolites (e.g.,deoxycoformycin, 6-mercaptopurine, 6-thioguanine, azathioprine,2-chlorodeoxyadenosine, hydroxyurea, methotrexate, 5-fluorouracil,capecitabine, cytosine arabinoside, azacytidine, gemcitabine,fludarabine phosphate and aspariginase); antimitotic agents (e.g.,vincristine, vinblastine, vinorelbine, docetaxel, estramustine);molecularly targeted agents including antibodies, antibody fragments, orcarbohydrates/polysaccharides (e.g., imatinib, tretinoin, bexarotene,bevacizumab, gemtuzumab ogomicin and denileukin diftitox); andcorticosteroids (e.g., fluocinolone acetonide and methylprednisolone).

viii. Other Modifications

In some embodiments, nucleic acid nanostructures include modificationsthat are not related to the nucleic acid sequence of the staple strandsor the scaffold sequence. In some embodiments, the nanostructuresinclude polymers or lipids, for example, surrounding or within the spaceenclosed by the nanostructure. In a particular embodiment,nanostructures include a complete surface coating, for example, bylipids or other polymers (e.g., polyethylene glycol or phospholipids).Complete surface coating of the nanostructures by lipids or otherpolymers (e.g., polyethylene glycol or phospholipids) could also be usedin order to make these objects able to escape immune defense and enablethe capacity of external modification. Therefore, in some embodiments,the surface of the nanostructure includes an amount of lipid or otherpolymer effective to coat the nanostructure and reduce immunesurveillance or immune uptake of the nanostructure. In some embodiments,the surface of the nanostructure includes an amount of lipid or otherpolymer effective to enable external modification, for example, byinsertion of one or more proteins, lipids, nucleic acids, polymers orsmall molecules into the lipid or polymer layer reconstituted around thenanoparticle. Preferred polymers are biocompatible (i.e., do not inducea significant inflammatory or immune response) and non-toxic.

Examples of suitable hydrophilic polymers include, but are not limitedto, poly(alkylene glycols) such as polyethylene glycol (PEG),poly(propylene glycol) (PPG), and copolymers of ethylene glycol andpropylene glycol, poly(oxyethylated polyol), poly(olefinic alcohol),polyvinylpyrrolidone), poly(hydroxyalkylmethacrylamide),poly(hydroxyalkylmethacrylate), poly(saccharides), poly(amino acids),poly(hydroxy acids), poly(vinyl alcohol), and copolymers, terpolymers,and mixtures thereof.

In preferred embodiments, the one or more hydrophilic polymer segmentscontain a poly(alkylene glycol) chain. The poly(alkylene glycol) chainsmay contain between 1 and 500 repeat units, more preferably between 40and 500 repeat units. Suitable poly(alkylene glycols) includepolyethylene glycol, polypropylene 1,2-glycol, poly(propylene oxide),polypropylene 1,3-glycol, and copolymers thereof.

In some, embodiments, amphiphilic proteins or other amphiphilicmolecules (e.g., drugs) including targeting moieties, or not includingtargeting moieties, or combinations, are inserted in a lipid layerreconstituted around the nanoparticles.

Some further non-limiting examples include targeting the therapeutic,prophylactic or diagnostic agent to the disease sites for therapeuticand/or diagnostic purposes.

V. Uses

DNA nanostructures prepared according to methods described above aresuitable for many applications. Some exemplary uses include in drugdelivery, in biosensors, in memory storage, in nano-electroniccircuitry, etc.

A. Delivery Vehicles

DNA nanostructures are suitable as a delivery vehicle for therapeutic,prophylactic and/or diagnostic agents. Since they are nucleic acidbased, DNA nanostructures are entirely biocompatible and elicit minimalimmune response in the host. The automated design of any desiredgeometry of DNA nanostructure further allows manipulation of DNAstructure tailored for individual drugs, dose, site of target anddesired rate of degradation etc.

Any prophylactic, therapeutic, or diagnostic agent can be incorporatedinto the DNA origami nanostructures via a variety of interactions,non-covalent or covalent. Some exemplary non-covalent interactions forattachment include intercalation, via biotin-streptavidin interaction,chemical linkers (e.g., using Click-chemistry groups), or viahybridization between complementary nucleotide sequences.

In some embodiments, the agents to be delivered are simply capturesinside the DNA origami nanostructures. In these cases, pore size of theDNA polyhedron is a key consideration, i.e., they are small enough sothat the agent captured does not leak out. In some embodiments, the DNApolyhedron are assembled in two halves to allow the capture of agentprior to the completion of the polyhedron nanostructures.

Prior work has shown that DNA origami as a carrier for anti-cancer drugssuch as doxorubicin had increased cellular internalization and increasedtarget cell killing as well as circumvented drug resistance (Jiang Q etal., Journal of the American Chemical Society 134.32: 13396-13403(2012)). Small molecules, such the anti-cancer drug doxorubicin, canattach to the DNA origami structures through intercalation.

1. Agents to be Delivered

In some embodiments, therapeutic, prophylactic, toxic, diagnostic orother agents are delivered using the nucleic acid nanoparticles.Exemplary agents to be delivered include proteins, peptides,carbohydrates, nucleic acid molecules, polymers, small molecules, andcombinations thereof. In some embodiments, the nucleic acidnanoparticles are used for the delivery of a peptide drug, a dye, anantibody, or antigen-binding fragment of an antibody.

Therapeutic agents can include anti-cancer, anti-inflammatories, or morespecific drugs for inhibition of the disease or disorder to be treated.These may be administered in combination, for example, a generalanti-inflammatory with a specific biological targeted to a particularreceptor. For example, one can administer an agent in treatment forischemia that restores blood flow, such as an anticoagulant,anti-thrombotic or clot dissolving agent such as tissue plasminogenactivator, as well as an anti-inflammatory. A chemotherapeutic whichselectively kills cancer cells may be administered in combination withan anti-inflammatory that reduces swelling and pain or clotting at thesite of the dead and dying tumor cells. Suitable genetic therapeuticsinclude anti-sense DNA and RNA as well as DNA coding for proteins, mRNA,miRNA, piRNA and siRNA. In some embodiments, the nucleic acid that formsthe nanoparticles include one or more therapeutic, prophylactic,diagnostic, or toxic agents.

2. Delivery of Agents

In some embodiments, therapeutic, prophylactic, toxic, diagnostic orother agents are delivered to a cell or tissue by endogenous uptake ofthe nucleic acid nanoparticles by the cell or tissue. In someembodiments, the agents are released from the nucleic acid nanoparticleswithin the blood stream. In other embodiments the agent are releasedwithin the gastro-intestinal system, uro-genital system, lymphaticsystem, central nervous system, or into the skin. The release of agentsbound to or otherwise associated with the nucleic acid nanoparticles canoccur in vivo, by contact with one or more enzymes, proteins or otherfactors present in physiological concentrations. Exemplary enzymesinclude nuclease enzymes, such as exonucleases, endonucleases and otherrestriction enzymes, proteases, hydrolases and other enzymes. Whenrelease of an agent involves a conformational change in the structure ofthe nucleic acid nanoparticles, the conformational change can occur as aresult of exposure to one or more physiological conditions, such as pH,salt concentration or interaction with one or more substances present inthe body.

B. Scaffold Structures for Display and Analysis of Molecules

DNA origami nanostructures can act as scaffolds for a variety ofmolecules including protein, nucleic acid, lipids, or polysaccharides.In some, embodiments, one or more molecules are conjugated to thenanoparticles. For example, in some, embodiments, one or more moleculesare conjugated to the outside of the nanoparticle, to the inside of thenanoparticle, or both to the inside and outside of the particle. Anymolecules of interest can be conjugated to the nanoparticles. Exemplarycategories of molecule that can be conjugated include proteins, lipids,carbohydrates, small-molecules, nucleic acids and combinations.

In some embodiments nanostructures are used to capture and/or restrainmolecules in a fixed and known orientation, for example, to assistbiophysical analyses, such as structural determination.

1. Systems for Capturing RNAs for Biophysical Characterization

In some embodiments, polyhedral nucleic acid nanostructures are used asa framework structure of known dimensions that is uniform and stable,for biophysical characterization of one or more molecules of interest.For example, in some embodiments, ribonucleic acid molecules can becaptured on a polyhedral nucleic acid nanostructure to orient anddisplay the RNA molecules amenable for structural or biochemicalcharacterization. An exemplary ribonucleic acid is a viral RNA genome.

A powerful class of highly structured DNA origami objects is obtainedwith high throughput synthesis and complexing with unstructured RNAsenabling their high-resolution reconstruction or low-resolutionstructural inference by determining complexation with the targetstructure of interest with complementary bait sequences, protein, orother small molecule affinity tags. The automated approach allowsprogramming, modelling, assembling, and structurally characterizing abroad range of DNA nanostructures using a single scaffold strandcombined with specific sets of staple strands, commonly referred to asDNA origami (Castro, C E et al., Nature Methods. 8, 221-229 (2011);Rothemund, P W, Nature. 440, 297-302 (2006); Krishnan, Y et al., Trendsin Cell Biology. 22, 624-633 (2012); Pan, K et al., NatureCommunications. 5, 5578 (2014)). DNA nanostructures have already showngreat potential for light harvesting (Dutta, P K et al., Journal of theAmerican Chemical Society. 133, 11985-11993 (2011); Pan, K et al.,Nucleic Acids Research. 42, 2159-2170 (2014)), metallic nanoparticlecasting (Sun, W et al., Science. 346, 1258361 (2014)), and biologicsdelivery (Douglas, S M et al., Science. 335, 831-834 (2012); Fu, J etal., Nature Biotechnology. 30, 407-408 (2012)). This ability tocomputationally design large libraries of DNA nanostructures ofarbitrary size and functionalization sites to “capture” RNA genomes suchas HIV allows for rapid prototyping of assemblies that can probespecific points in 3D space.

a. Biophysical Characterization of Bound Molecules

The methods further provide structural characterization of theribonucleic acids encapsulated by the polyhedral nucleic acidnanostructures. These complexes are suitable for structuralcharacterizations using techniques including chemical foot-printing,cryo-electron microscopy, X-ray crystallography, small-angle X-rayscattering, analytical ultracentrifugation, chromatographic methods,light scattering and combinations thereof. For example, in someembodiments, the methods include chemical foot printing by selective2′-hydroxyl acylation analyzed by primer extension (SHAPE), andcryo-electron microscopy for determining secondary and tertiarystructures of the ribonucleic acids. Automatically-generated librariesof nanostructures including capture tags or “bait” sequences can be usedto produce nanostructures for use in high-throughput methods. Forexample, object selection and automated design follow the principlesthat the pores are either big enough to allow diffusion of RNA into theobject (as in FIG. 16A), or that they are used in two halves that cagethe RNA after binding. In preferred embodiments, the computationalmethod takes into account diversity of scaffold sequences, whichadditionally is used to aid in characterization of nanostructures basedon thermal melting analysis. For example, in some embodiments, an RNAstructure pipeline is formed which takes an RNA molecule and, throughSHAPE analysis and nanostructure binding assays the secondary structurewith 3D constraints as well as the full 3D structure are determined bycryo-EM.

In some embodiments, highly structured DNA origami objects that arecomplexed with unstructured RNAs are used to enable theirhigh-resolution reconstruction. In preferred embodiments,computationally designed large libraries of rigid nucleic acidstructures of arbitrary size and functionalization sites that “capture”RNA genomes, such as HIV, allow for rapid prototyping of assemblies thatprobe specific points in 3D, similar to the notion of a lock-and-keymechanism. In other embodiments, complexes of nucleic acids andnanostructures are used for detection of structured nucleic acids, forexample, RNA molecules, in blood or other biological samples ofinterest. Complexes of nucleic acids and nanostructures can be read-outfor bio-sensing applications. For example, “barcodes” tagged ontostructured DNA assemblies or other structure-specific tags or ligandscan be used as capture agents to select or identify target RNAs ofinterest. As an analogy, for a “key” that is of unknown shape (i.e., theRNA), the shape is determined by trying it against a combinatoriallibrary of “locks” (i.e., the arbitrary shaped DNA nanostructures) todetermine which lock fits the key best. Typically, lock and keyinteractions occur through avidity binding interactions to arbitrarybaits of interest that are structured in 3D space to complement thetarget structure of interest.

In some embodiments, a library is constructed of DNA nanostructures withcomplimentary regions to find the “locks” that fit the RNA key. In someembodiments, the readout is in the form of high-throughput methods thatdetermine both tight binding of the RNA to the DNA nanostructure, aswell as through chemical foot-printing analysis to obtain structuralconstraints of the RNA on the nanostructure, and to ensure nodeformation of the key by the lock. In preferred embodiments, programmedDNA nanostructures capture and hold RNAs in native conformationsallowing for library-based conformational probing and uniform particlevisualization by cryo-EM.

In some embodiments, this RNA lock-and-key approach is applied to selectthe best structures for characterization by cryo-EM. In preferredembodiments, the structure is modeled and validated by cryo-EM, with aresolution <20 Å.

Typically, the methods include one or more of the following steps:

(A) Generating a library of DNA nanostructures in silico;

(B) Synthesizing and folding of DNA nanostructures;

(C) Binding of DNA nanostructures to RNAs of interest;

(D) Structural characterization of the DNA nanostructure and RNAcomplexes.

Optionally, the library of DNA nanostructures in silico is furtherselected to reduce the total number of DNA scaffold and staple strandsfor maximal spatial coverage while limiting redundancy thus beingexperimentally practical in synthesizing and assembly of the target DNAnanostructures.

Methods for generating high-throughput nanostructure libraries aredescribed. The methods determine distance constraints of nanostructuredRNA to different points of attachment on a nanostructure of knowngeometry, use chemical foot-printing of RNA on nanostructures, andoptimize binding of the RNA to instances of the DNA nanostructurelibrary for ultimate use in cryo-EM, crystallography, or scattering.

An overview of an exemplary high throughout pipeline for characterizingsecondary, and tertiary structures of RNAs is shown in FIG. 18 .

This is fundamentally different from existing methods for structural andsequence analyses, which do not create scaffolds for RNA presentationand are limited by the large size of RNAs combined with theirconformational flexibility. Use of nucleic acid nanostructures ofarbitrary size and functionalization sites to “capture” RNA genomes suchas HIV allows for rapid prototyping of assemblies that can probespecific points in 3D. Structures of RNAs can be back-determined byscreening it against a combinatorial library of nanostructures havingdifferently-arranged binding motifs, to determine which “lock” (i.e.,structural conformation) fits the “key” (i.e., RNA) best. In thismanner, a library of DNA nanostructures can be generated coveringcomplimentary regions, to find the lock(s) that fit the RNA key, as wellas other applications. The nucleic acid scaffolds can include auser-defined scaffold sequence, and the staple sequences are variedaccordingly.

The methods can include conducting biophysical analyses of theframework/target molecule complex, such as chemical foot-printing,fluorimetry or colorimetery based read-out, or high-resolutionstructural characterization using x-rays or cryo-EM.

Methods of structurally and chemically characterizing 3-D structures ofsingle-stranded nucleic acids of interest are provided. In someembodiments, the single-stranded nucleic acid molecules aresingle-stranded RNA molecules. An exemplary single-stranded RNA moleculeis a viral genome. In some embodiments, the single-stranded RNA genomeis the HIV genome.

Therefore, methods of encapsulating a viral genome such as the HIV-1genome using a set of polyhedral nucleic acid nanostructures withsingle-strand bait sequences complementary to the regions of thesingle-stranded HIV genome, and subsequent structural characterization,are also provided. Other exemplary structured nucleic acids of interestinclude messenger RNAs, long non-coding RNAs, and structured genomicsegments, such as chromatin, etc.

The tertiary structures of the captured RNAs are determined usinghigh-throughput chemical foot-printing and sequencing andhigh-resolution cryo-electron microscopy, or other optical read-outincluding fluorimetry or colorimetric assay. Based on the naturalstructural principles helpful in the ribosome structure determination,with proteins acting to structure the ribosomal RNA, by adding DNA-basedstructured elements to the HIV genome, uniform and orientable RNAobjects are generated for high resolution structural characterization. Aschematic illustration showing diversity of nanostructure library designis depicted in FIG. 18 .

i. Binding Detection Analysis

Detection of stably bound target molecules can be done through a varietyof methodologies known in the art but specifically implemented as partof the method for optimal nanostructure selection from a library.Notable detection methods are through the use of quantitativereverse-transcription polymerase chain reaction (qRT-PCR), fluorometric,colorimetric, and calorimetric methods.

In some embodiments, the RNA is bound to nanostructures that are affixedto solid support and then washed multiple times under a variety ofconditions including but not limited to increased temperature anddecreased salt. Remaining tightly bound RNA is then reverse transcribedand quantitative detection is carried out to determine the exact amountof RNA present against the bound DNA nanostructures.

In some embodiments, RNA affinity can be tested against the library ofDNA nanostructures using calorimetric tests including differentialscanning calorimetry and isothermal titration calorimetry.

In some embodiments, the binding of the RNA can be detected byfluorescence or calorimetric assays, which include the toe-hold releaseof a second oligonucleotide that triggers the translation of afluorescent protein production (GFP) or enzyme that modifies smallmolecules to change color (as in (Pardee et al., Cell, 159:940-945(2015)), or fluorescent RNA folding (e.g. RNA spinach aptamer).

ii. Chemical Foot-printing Analysis

In some embodiments, chemical foot-printing analysis is used to obtainstructural constraints of nucleic acids of interest bound onto thenucleic acid nanostructures. An exemplary chemical foot-printingtechnique is selective 2′-hydroxyl acylation analyzed by primerextension (SHAPE).

SHAPE chemistries exploit small electrophilic reagents that react with2′-hydroxyl groups to interrogate RNA structure at single-nucleotideresolution (Wilkinson K A et al., Nat Protoc. 1(3):1610-6 (2006)).Mutational profiling (MaP) identifies modified residues by using reversetranscriptase to misread a SHAPE-modified nucleotide and then countingthe resulting mutations by massively parallel sequencing. The SHAPE-MaPapproach measures the structure of large and transcriptome-wide systems(Smola M J et al., Nat Protoc. 10, 1643-1669 (2015)). In someembodiments, one or more chemical foot-printing methods are usedincluding exemplary methods such as SHAPE, and SHAPE-MaP.

In yet other embodiments, pre-folded RNAs, and DNA nanostructures areincubated together and allowed to form complexes. Dimethylsulfate (DMS),N-methyl isatoic anhydride (NMIA), neat DMSO, or buffer is added to thecomplex samples in each well. The action of DMS or NMIA is to modify the2′O of the RNA when single stranded (Siegfried, N A et al., NatureMethods. 11, 959-965 (2014); Henderson, R et al., Structure. 20, 205-214(2012); He, Y et al., Nature. 452, 198-201 (2008)). Subsequently, thesample is desalted and the published technique of SHAPE-MaP is used,where the reverse transcriptase is added along with Manganese (ratherthan Magnesium) to generate mutations at sites of 2′O modifications(Siegfried, N A et al., Nature Methods. 11, 959-965 (2014)). In someembodiments, next generation sequencing is used to obtain a mutationprofile, showing the secondary structure of the bound RNA. This revealsboth sites of binding to the object (by absence of mutations compared tothe single-stranded unbound modifications) and also the secondarystructure constraints on the bound RNA. Optimal DNA and RNAconcentrations are optimized to ensure high-quality SHAPE data of singlybound complexes. In some embodiments, random baits and internal baitsare used as controls, as well to probe non-specific interactions.

iii. Cryo-Electron Microscopy (Cryo-EM)

Single-particle cryo-EM and subsequent 3D particle reconstruction is asuperior method for structural elucidation of designed nanostructures.Cryo-EM has already been a proven method for RNA structure determinationof the ribosome (Amunts, A et al., Science. 348, 95-98 (2015)), as wellas for large viral particles (Wang, Z et al., Nature Communications. 5,4808 (2014)). In some embodiments, RNAs that have been rigidified by theDNA nanostructures are subsequently subject to structuralcharacterization, for example, by cryo-EM reconstruction, X-raycrystallography, etc.

In some embodiments, adjustments are made to sequence and geometry ofprogrammed DNA nanostructures to aid in further structural rigidity forincreased resolution in cryo-EM. In some embodiments, chirality of theDNA nanostructures is implemented using asymmetry in the nanostructuredesign or by the addition of duplexes and/or gold nanoparticles tospecific locations on the DNA nanostructure. In other embodiments, toaid in structural characterization, gold nanoparticles attached tosingle stranded DNA baits specific to other locations in the RNA is usedto identify the presence and location of the RNAs during cryo-EMstructuring. In further embodiments, sequence routing techniques thatallows for use of square or honeycomb edges are implemented, making amuch more rigid overall structure. In yet further embodiments, adding intension and twist to the DNA nanostructure allow for structures to beforced to free energy minima (Wilkinson, K A et al., Nature Protocols.1, 1610-1616 (2006)).

In some embodiments, RNAs deemed stably attached to the DNAnanostructure are structurally solved by cryo-EM, with SHAPE-determinedsecondary structure and distance constrains incorporated into the finalmodel. Multiple nanostructure designs allow for a better understandingof native RNA structural fluctuations. In some embodiments, RNA-DNAnanostructures that have identical conformation are generated to allowthe enhancement of the signal to noise through particle averaging. Infurther embodiments, hundreds of thousands of images of molecules withidentical conformations but with different orientations are averaged inorder to obtain high resolution cryo-EM structure of asymmetricmolecules. The cryo-EM field has the capability to resolve structures ofmolecular machines to 2-3 Å resolution routinely (Fan, et al., Nature.527, 336-341 (2015); Wang, et al., Nature Communications, 5, 4,808(2014)).

C. Sensors

DNA origami nanostructures can act as biosensors for a variety ofmolecules including protein, nucleic acid, lipids, or polysaccharides.In particular, the DNA origami nanostructures prepared according to themethods described above are capable of adopting any arbitrary shapes,therefore making them ideal sensor for other molecules, or secondary andtertiary structures of other molecules.

For example, DNA origami nanostructures have been shown to act as a DNArepair nanosensor at single-molecular level (Tintore, et al., Angew ChemInt Ed Engl. 22; 52 (30): pp. 7747-50 (2013)).

In some embodiments, DNA origami nanostructures are used to capture RNAmolecules of interest for probing their secondary and tertiarystructures. In preferred embodiments, the DNA origami nanostructure:RNAcomplexes are suitable for further structural analysis for example,particularly using selective 2′-hydroxyl acylation analyzed by primerextension (SHAPE), or cryo-EM analysis.

In some embodiments, DNA origami nanostructures are designed for bindingto a particular RNA virus, for example human immunodeficiency virus(HIV), influenza, Ebola, hepatitis C, SARS, and Zika viruses. In someembodiments, DNA origami nanostructures are used as RNA/viral detectionsensors for use in the battlefield.

D. Nanoelectronic Circuitry

DNA nanostructures prepared according to methods described above aresuitable for use as nanoscale electronic devices. The automation of DNAnanostructure design allows user input of any desired geometry. Thestaple strands can be functionalized for incorporating any desiredfunctionalities such as anchoring to any surfaces, for incorporating anynon-naturally occurring molecules, etc.

In some embodiments, metallization of the DNA template is used forcircuit fabrication. In preferred embodiments, the shape of DNA origaminanostructures is maintained after the metallization process.

The present invention will be further understood by reference to thefollowing non-limiting examples.

E. Imaging Probes

DNA nanostructures prepared according to the described methods above aresuitable for use as a molecular probe, for example, as a fluorescentprobe. Based on the capacity to generate structures with randomgeometries and size, and the facility of modification on prescribedposition determined by the user, fluorescent dyes could be easilyconjugated. The number of fluorescent dyes that can be conjugateddepends on the structure size and the number of the staple strands thatcan be modified.

In some embodiments dye-conjugated nucleic acid nanostructures are usedfor conjugation to specific ligand-binding moieties, such as antibodies,aptamers, protein-binding domains, etc., for example, by integratingchemical groups (e.g., Click-chemistry groups, amine groups, etc.) intothe nanostructures. Nanostructures including specific ligand-bindingmoieties are used for labelling and imaging applications, such asimaging and super-resolution imaging.

F. Light-harvesting and Excitonic Circuits

DNA nanostructures containing densely or loosely packed aggregates ofchromophores can be used as excitonic energy transfer circuits.Chromophores of prescribed types can be organized using the 3D arrayspublished here to form 1D/2D/3D architectures for exciton funneling andtransport in nanoscale energy transport.

G. Vaccines and Adjuvants

3D organizations of viral proteins can be used to stimulate the immunesystem by presenting these proteins in geometries that mimic the one ormore naturally occurring antigens. Exemplary antigens include viralantigens, parasite antigens, bacterial antigens, allergens orenvironmental antigens and tumor antigens. In an exemplary embodiment,the antigen is a natural viral capsid structure.

Specific DNA sequences may also be included as adjuvants, with the 3Dpatterning in geometry and size controlled in an arbitrary manner usingthe procedure provided here in which the DNA wireframe geometryscaffolds viral proteins or peptides or other active fragments. In someembodiments the antigen is a viral antigen. A viral antigen can beisolated from any virus. In some embodiments the antigen is a bacterialantigen. Bacterial antigens can originate from any bacteria. In someembodiments the antigen is a parasite antigen. In some embodiments theantigen is an allergen or environmental antigen. Exemplary allergens andenvironmental antigens, include but are not limited to, an antigenderived from naturally occurring allergens such as pollen allergens(tree-, herb, weed-, and grass pollen allergens), insect allergens(inhalant, saliva and venom allergens), animal hair and dandruffallergens, and food allergens. In some embodiments the antigen is atumor antigen. Exemplary tumor antigens include a tumor-associated ortumor-specific antigen

EXAMPLES Example 1: Fully Automatic and Robust Inverse Design ofProgrammed DNA Assemblies

Structure-based, rational design of macromolecular assemblies includingboth nucleic acids and proteins is a long-standing aim of nanotechnologyand biological engineering. Unlike proteins, which contain a myriad ofspecific and non-specific inter-residue interactions that determinetheir local and global folds, and RNA, which exhibits promiscuity insecondary structure and base-pairing, synthetic DNA assemblies are wellestablished to be highly programmable using Watson-Crick base pairingalone (Seeman, et al., Biophys. J. 44, 201-209 (1983); Rothemund, P W K,Nature, 440, 297-302 (2006)). In particular, wireframe polyhedralgeometries offer the powerful ability to program nearly arbitrary 3Dgeometries on the nanometer scale, limited only by current sizeconstraints imposed by single-stranded scaffold lengths. This importantand versatile class of topologies therefore has broad potential forprogramming complex nanoscale geometries including biomimetic systemsinspired by viruses, photosynthetic systems, as well as other naturalhighly evolved macromolecular assemblies. Achieving full automation ofinverse sequence design using this versatile wireframe approach has thepotential to realize the original vision of Ned Seeman to programnanoscale materials with full 3D control over positioning of allatomic-level groups (Seeman, N C et al., Biophys. J. 44, 201-209 (1983);Rothemund, P W, Nanotechnology: Science and Computation. 3-21 (2006)).

As an alternative, a robust and fully automatic inverse design procedureis introduced here that programs arbitrary wireframe DNA assembliesbased on an input wireframe mesh without reliance on user feedback orlimitation to spherical topologies. The procedure has been applied todesign 35 Platonic, Archimedean, Johnson, and Catalan solids, sixasymmetric structures specified using surface geometry alone, as well asfour polyhedra with non-spherical topologies. Designed sequences areused to synthesize icosahedral, tetrahedral, cuboctahedral, octahedral,and reinforced hexahedral structures using asymmetric PCR (aPCR) forfacile production of single-stranded scaffolds of custom length andsequence. Programmed objects are confirmed using cryo-electronmicroscopy (cryo-EM), folding, and stability assays, to be both highfidelity structurally as well as stable under low-salt buffer conditionsimportant to biological as well as in vitro applications.

Methods and Materials

Design Formula

Specifying Geometry

The goal of this work is to design nanostructures with a top-downapproach: given a target structure of specified size and geometry, aswell as scaffold sequence; the formula will generate the required staplestrand sequences to experimentally fold the structure. To specify thegeometry, the spatial coordinates of all vertices, the edgeconnectivities between vertices, and the faces to which vertices belongmust be provided. These may be provided manually, or through a fileformat that specifies polygonal geometry, such as the Polygon FileFormat (PLY), Stereolithography (STL), or Virtual Reality ModelingLanguage (WRL). As explained in more detail below, any closed,orientable surface network can serve as input to the formula (FIGS. 2Aand 2B). Provided in the code is a parser to convert PLY files into therequired inputs.

In addition to the spatial information, the lengths of the edges must bespecified, with the constraint that each must be a multiple of 10.5 bp,rounded up or down to the nearest nucleotide, with a minimum of 31 bp.For structures with equal edge lengths throughout the geometry, such asPlatonic, Archimedean, or Johnson solids, this is easily satisfied,whereas for other geometries rounding edge lengths may be required,resulting in some possible deviation between the specified targetstructure and final design. In these cases, the desired minimum edgelength (e.g., 31 or 42 bp) is assigned to the shortest edge and theother edges are scaled and rounded appropriately. When using theautomated rounding to generate edge lengths, the user is advised toverify that edge lengths are satisfactory before proceeding to thescaffold routing procedure.

Generating the Spanning Tree

In routing the single-stranded scaffold through the entire DNA origamistructure, the first requirement is to ensure an Eulerian circuit(Ellis-Monaghan, J A et al., Nat. Comput, 1-13 (2014)). An Euleriancircuit, more strict than an Eulerian path, is required because the endsof the scaffold should be adjacent to create a single scaffold nick. Inthe case of circular scaffold strands, the nick is where the excessstaple forms a loop.

An Eulerian circuit is guaranteed when the degree of every vertex iseven. This can be achieved by using an even number of duplexes per edgein the structure; in this work, two duplexes per edge were chosen, eacha DX-tile.

Even though finding an Eulerian circuit is guaranteed, there is amultitude of routing solutions that are all Eulerian circuits. However,not all circuits would lead to effective scaffold routings. For example,a scaffold strand entering a vertex from one edge must exit the vertexfrom an adjacent edge, one that shares a face with the first edge. Itmay not exit from the same edge it came from, nor may it exit from anon-adjacent edge. The former case would lead to an edge that isdisconnected from the vertex, and the latter case would lead tointersecting DNA strands. This requirement leads to a subset of Euleriancircuits known as A-trails (Bent, S W et al., Discrete Appl. Math. 18,87-94 (1987)).

With these constraints in mind, there are a few corollaries that follow.Looking at a single edge, crossovers between two helices may be used tostrengthen the rigidity of the combined unit (Kallenbach, N R et al.,Nature. 305, 829-831 (1983)). Crossovers can occur with staple strandsor with scaffold strands. In the case of the design paradigm presentedhere, which employs two duplexes per edge, there can only be zero or onescaffold crossover per edge. More than one scaffold crossover would leadto internal scaffold loops that are disconnected from the rest of thescaffold. Therefore, a scaffold strand entering an edge from a vertexcan either leave the edge from the same vertex or from the other vertex.Similarly, looking at a single vertex, at least one edge connected tothe vertex must have zero scaffold crossovers. If all edges at thevertex have scaffold crossovers, an internal loop would be generated.However, a scaffold crossover is in fact necessary. Looking at a singleface or any closed circuit of edges, at least one edge must have onescaffold crossover. If all edges have no scaffold crossovers, aninternal loop would be generated.

Because scaffold crossovers are required, one way to design a polyhedronwould be to identify all the possible locations for a scaffoldcrossover, and then select which locations shall have the scaffoldcrossover based on the above criteria. Doing so would lead to a validstructure, and is a viable approach for designing arbitrary scaffoldedDNA origami (Pan, K et al., Nat. Commun. 5 (2014)). However, the numberof locations for a scaffold crossover scales with the size of theobject, making the formula computationally intractable for manystructures. Most apparently, structures with the same symmetry havesimilar routing patterns regardless of size. A very general approachcannot take advantage of this and would iterate through allpossibilities of scaffold crossovers, when in fact the solution ispredictable and can be reached more quickly.

Given the above restrictions on the number and location of edges with ascaffold crossover for DNA origami objects, solving the scaffold routingcan be mapped to a simpler problem: classifying edges as eitherpossessing zero scaffold crossovers or possessing one scaffoldcrossover.

From the above criteria, the edges with zero scaffold crossovers mustconnect to every vertex, and there can be no cycles of edges with zeroscaffold crossovers, meaning that there are V−1 edges with zero scaffoldcrossovers, where Vis the number of vertices, and the rest have onescaffold crossover. The edges with zero scaffold crossovers meet thedefinition of a spanning tree of a network (see FIG. 1 ).

Thus, solving the scaffold routing problem is identical to solving for aspanning tree of the structure, where each possible spanning treecorresponds to a unique scaffold routing. FIGS. 3A-3D present twodifferent spanning trees for a tetrahedron. Given a network, Prim'sformula can be used to conduct a Breadth-first search (BFS) spanningtree. If, as in this case, all edges are weighted the same, Prim'sformula will generate a breadth-first search spanning tree, one with themost branches. It has been shown that branching trees self-assemble morereliably than more linear trees (Pandey, S et al., Proc. Natl. Acad.Sci. 108, 19885-19890 (2011)).

Note that with this spanning tree formula, there are no restrictions onthe topology of the network. Any arrangement of nodes and edges can berouted with an Eulerian circuit, using a spanning tree to define theplacement of scaffold crossovers. However, the use of faces allowsA-trails to be defined much more easily in automation, and some networksdo not have clearly defined faces, i.e., planar faces with anunambiguous outward normal. (An example of this would be eight cubesstacked in a 2×2×2 formation, for a total of 27 vertices. The facesaround the vertex at the center are not clearly arranged, making severalscaffold routings about that vertex possible.) As such, the currentformula is well-suited for any closed surface, which includes not onlyspherical topologies but also toroidal polyhedra and other geometrieswith holes.

Adding Pseudo-Nodes and Routing Scaffold

Once the spanning tree has been determined, the graph needs to beconverted to an Eulerian circuit (FIG. 1 and FIGS. 3A-3J). First, foreach edge that is not in the spanning tree, a pair of pseudo-nodes isadded to split the edge into two halves, each corresponding to one sideof a scaffold crossover (FIGS. 3E and 3F).

Next, for each vertex in the graph, a set of pseudo-nodes is added toreplace the vertex node. A vertex of degree N has N edges emerging fromit and N faces between them. For each face, a pseudo-node is placed thatjoins the two bordering edges and disconnects them from the other edges.After all pseudo-nodes are placed for all vertices, the original vertexnodes are no longer part of the graph, and each edge is now bounded onboth ends by pseudo-nodes. This defines the Eulerian circuit throughwhich the scaffold will be routed (FIGS. 3E-3F).

This circuit defines two possible routings: one that routes around facesclockwise, and one that routes around faces counterclockwise, relativeto the outward normal. The direction of the scaffold is chosen to runcounterclockwise around each face, so that for convex vertices (themajority of cage vertices) the major grooves of the duplexes at eachvertex point inward to minimize electrostatic repulsion of the backbone(FIG. 1 , and FIGS. 3G-3L) (He, Y et al., Angew. Chem. 122, 760-763(2010)). The undirected graph is converted to a directed graph toimplement this directional choice.

At this point, the lengths of the edges are introduced into the formula;the spanning tree and pseudo-node addition are only geometry-specific,independent of size. The scaffold nick position, for simplicity, ischosen to always be located on an edge without scaffold crossovers, onthe duplex far from staple nicks and crossovers. Using Prim's formula,this edge will have Vertex #1 as one of its endpoints, since with themost-branching default all edges connected to Vertex #1 are members ofthe spanning tree. Marking this 5′-end as scaffold base #1, each of thescaffold bases are subsequently numbered with knowledge of the edgelengths and routing scheme, all while keeping track of their relativeposition on their edge. Note that for each edge, the 5′-end overhangsthe 3′-end by one nucleotide to ensure that all staple and scaffoldcrossovers remain perpendicular to the helical axes. The half-edges,namely those edges that are split by the scaffold crossover, havelengths that are pre-determined by some simplifying assumptions. Thescaffold crossover is placed as close to the center as possible, with aconvention set here to have a preference towards the lower-index vertexif needed. Therefore, it is deterministic how long a particular sectionof a scaffold is on a given edge (FIG. 3M, and Table 1).

TABLE 1 Determining the location of a scaffold crossover for a givenedge length. The variables X and Y are defined for the region hybridizedby edge staples, ignoring the 10- and 11-bp regions on either sidehybridized by vertex staples. By convention set here, X ≤ Y, where Y isthe region closer to the lower-index vertex, and Y is the region closerto the higher-index vertex. X + Y + 21 = L X ≤ Y L = 21n + {10 or 11} L= 21n + 0 Edge length L is Off-center by 0.5, Y = X + 1 Off-center by5.5, Y = X + 11 even (e.g. Z = 52: X = 15, Y = 16) {e.g. L = 42: X = 5,Y = 16) Edge length L is Off-center by 0, Y = X Off-center by 5, Y = X +10 odd {e.g.L = 3YX = 5,Y = 5} (e.g. Z = 63: X = 16, Y = 26)

Adding staple strands and generating sequence Each scaffold base now hastwo pieces of information associated with it: one index number indicatesits position on the scaffold strand, and set of numbers indicate itsspatial location: the edge, the duplex, and the position from the 5′end. In routing the staple strands, the latter set is used to identifywhich bases in the staples are paired with which bases in the scaffold,then the former index number is assigned to the staples accordingly.

There are three categories of staple strands, each with their ownprescribed pattern: staples on vertices, staples on edges with scaffoldcrossovers, and staples on edges without scaffold crossovers. Thestaples on vertices pair with the first 10-11 nucleotides of each duplexabutting the vertex, with poly-T bulges of length 5 crossing betweenedges. There are two varieties of vertex staple designs implemented: onesystem uses single crossovers in some places to ensure that there is10-11 bp of continuous duplex for high specificity and binding strength,and the other, more traditional, system uses double crossoverseverywhere, leading to a minimum of 5 bp of continuous duplex (He, Y etal., Nature. 452, 198-201 (2008); Zhang, F et al., Nat. Nanotechnol. 10,779-784 (2015)). For the structures synthesized and characterized inthis work, the former paradigm is used, as the higher binding strengthwas found to create a more cooperative transition at a highertemperature (FIGS. 9I-9L). The pattern of staple routing depends on thedegree of the vertex, ensuring that each staple length is 52- or78-nucleotides (nt) long for ease of synthesis.

$\begin{matrix}{a = \left\{ \begin{matrix}{0,} & {{{if}n{mod}3}\  = 0} \\{2,} & {{{if}n{mod}3}\  = 1} \\{1,} & {{{if}n{mod}3}\  = 2}\end{matrix} \right.} & \left( {{Eq}.1} \right)\end{matrix}$ $\begin{matrix}{b = \frac{n - {2a}}{3}} & \left( {{Eq}.2} \right)\end{matrix}$

where a is the number of 52-nt staples at the vertex,

b is the number of 78-nt staples at the vertex, and

n is the degree of the vertex.

The edge staples pair with the intermediate nucleotides between vertexstaples. For the edges with scaffold crossovers, two 31-32-nt staplesare placed across the scaffold crossover, together occupying a 15-16-ntregion on either side of the crossover for sufficiently strong binding.The remainder of scaffold has 42-nt staples placed to create staplecrossovers every 21 base pairs, with a 20- or 22-nt staple in the caseof a 10- or 11-nt remainder. The edges without scaffold crossoverssimply follow this latter pattern, filling with as many 42-nt staplesthat can fit and using a 20- or 22-nt staple when necessary (FIG. 3N).

The minimum edge length allowed in this design paradigm is 31 bp. Anysmaller value will place a scaffold crossover 5 nt away from the end ofan edge (in the vertex staple region) and would not lead to a highyield. However, the rules described above work well for lengths 42 bpand greater, and they need to be modified slightly for 31- and 32-bpedges. First, a 31/32-bp edge has 21 bp occupied by vertex staples,leaving 10 or 11 bp for edge staples. Therefore, in both types of edges,a 20- or 22-bp staple is placed with a single crossover on one side,because a staple nick in the middle would conflict with the scaffoldcrossover. This in turn means that the single-crossover vertex stapledesign may lead to a missing crossover, so to be safe thedouble-crossover vertex staple design is always used in any structurewith a 31- or 32-bp edge present.

After all the staples are placed, each staple is a vector of numbers,each value corresponding to the scaffold nucleotide to which it is basepaired. Then, the input or generated scaffold sequence is used, matchinga base identity (A, T, G, or C) to a scaffold number. If no sequence isprovided, a segment of M13pm18 is used by default if the requiredscaffold length is less than 7249 nucleotides, and a sequence israndomly generated if the required length is greater. The complementarynucleotide via Watson-Crick base pairing is then be computed andassigned to the corresponding staple nucleotides. Finally, this list ofstaple sequences is output for synthesis.

Predicting 3D Structure

The positions of each base pair are calculated by interpolating betweenthe two ends of the edge it resides on, and shifting awayperpendicularly from the central axis by 10 Å, half the interhelicaldistance for an anti-parallel crossover. The edge is assumed to lie in aplane with a normal vector defined by the sum of the unit normal vectorsof the two neighboring faces.

There are several ways to define the location of the ends of the edges.The DX-tile edges can be assumed to be two parallel cylinders withcombined width 40 Å (20 Å inter-helical distance and 20 Å duplexdiameter). This can be further simplified to a rectangle with width 40Å, with the line of the edge serving as a central axis (FIGS. 5A-5F). Inthe ideal case, the corners of these rectangles meet, since the scaffoldexits and enters the edge from these locations. The widths of therectangles together would form an N-sided regular polygon, because theyhave the same sides and have equal angles between them. Theperpendicular distance from the center of this polygon and an edge (thebeginning of the interpolation) is the inradius of this polygon. Fromthe inradius, the distance between the vertex and the beginning of theDX-tile edge is determined using the sum of the face angles. If themulti-arm DX-tile were flat, this would be equivalent to the inradius.

$\begin{matrix}{s = {\frac{2\pi}{\theta_{tot}}r}} & \left( {{Eq}.3} \right)\end{matrix}$

where s is the distance between the vertex and the beginning of theDX-tile edge,

r is the inradius of the polygon formed by the widths of the tiles, and

θ_(tot) is the sum of all face angles at the vertex.

For regular N-sided polygons,

$\begin{matrix}{r = {\frac{w}{2}{\cot\left( \frac{\pi}{N} \right)}}} & \left( {{Eq}.4} \right)\end{matrix}$

where w is the combined width of the DX-tile (40 Å).

There are some structures, however, whose edges do not meet at regularangles, such as the Archimedean solids. In that case, depending on theconvention used to define the length of the inradius, there will bebackbone stretches or nucleotide overlaps (FIGS. 5A-5F). For arepresentative Archimedean solid, the size of the object is best fitwhen backbone stretches are minimized, where the inradius is calculatedbased on the largest face angle.

$\begin{matrix}{r = {\frac{w}{2}{\cot\left( \frac{\theta_{\max}}{2} \right)}}} & \left( {{Eq}.5} \right)\end{matrix}$

where θ_(max) is the largest face angle. Note that this general equationapplies to regular N-sided polygons as well, since θ_(max)=2π/N.

For structures with concave vertices, where θ_(tot)>2π, to obey theconvention that all edge axes meet at a single point, it is defined thats=r, creating a sphere of radius r that defines the edge boundaries.

A schematic representation of the workflow including each of the stepsto implement the top-down design of nucleic acid nanostructures isdepicted in FIG. 1 .

Results

To enable the fully automatic and robust inverse design of programmedDNA assemblies in order to pattern 3D geometries of arbitrarylipids/proteins/sugars/RNAs/PNAs scaffolded using DNA nanostructures,arbitrary geometries were rendered as node-edge networks based on theDX-based wireframe motif in which inter-connected edges consist of twoduplexes joined using anti-parallel (DX) crossovers (FIG. 1 )(Rothemund, Nanotechnology: Science and Computation. 3-21 (2006); He, etal., Nature. 452, 198-201 (2008); Zhang, et al., Nat. Nanotechnol. 10,779-784 (2015); Yan, et al., Science. 301, 1882-1884 (2003); Fu, et al.,Biochemistry (Mosc.). 32, 3211-3220 (1993)). This strategy offeredapplication of the procedure to any closed geometric surface includingnon-spherical topologies such as a torus, provided that it can berendered using polyhedral surface meshes. Using this approach, thespatial coordinates of all vertices, the edge connectivities betweenvertices, and the faces to which vertices belong fully specify thetarget object. Standard polyhedron file formats containing thisinformation are converted into this set of arrays, providing input tothe scaffold routing and staple design procedure (FIGS. 2A and 2B).Programmed edges are required to consist of multiples of 10.5 bp roundedto the nearest nucleotide, as commonly assumed in DNA origami design tosatisfy the natural helicity of B-form DNA. Obeying the natural geometryof DNA ensures that no over- or under-wind in duplexes occurs, which mayotherwise result in shape distortions that force deviation from thetarget geometry and would require iterative, ad hoc adjustment of edgelengths and sequence design (Yan, et al., Science. 301, 1882-1884(2003)).

Representing the target geometry as a polyhedral mesh that satisfies thepreceding design criteria guarantees that a single-stranded scaffold canbe routed uniquely throughout the entire object using an Euleriancircuit, without modifications to the target geometry. From the mesh,the graph of the target structure is computed, containing the vertex,edge, and face information (FIG. 1 ). Scaffold routing is then assignedusing a spanning tree and Prim's formula (Prim, R C Bell Syst. Tech. J.36, 1389-1401 (1957)). Each of the edges that is a member of thespanning tree corresponds to an edge without a scaffold crossover,whereas the remainder have one scaffold crossover. Every possiblespanning tree for a given graph corresponds to a unique scaffoldrouting, where by default Prim's formula generates a maximally-branchingspanning tree that has been shown to self-assemble more reliably thanlinear trees (FIGS. 3A-3L) (Pandey, S et al., Proc. Natl. Acad. Sci.108, 19885-19890 (2011)). Importantly, use of DX arms to represent edgesalso ensures a solution that is efficiently obtained in solution time onthe order of E log V, where E and V are the number of edges andvertices, respectively (Prim, R C Bell Syst. Tech. J. 36, 1389-1401(1957)). To complete the scaffold routing, a scaffold crossover isplaced at the center of each edge that is not part of the precedingspanning tree. A linear scaffold nick position is set to ensure that itis non-coincident with crossovers and other nicks, with the polarity ofits routing chosen to be counter-clockwise around each face due to thepreference of the major groove to orient inwards at vertices (He, Y etal., Angew. Chem. 122, 760-763 (2010)). Thus, use of the spanning treeapproach enables fully automatic conversion of the input polyhedralgeometry to full scaffold routing based on the single circuit thattraverses each duplex once.

With the scaffold routing determined, staple strands are assignedautomatically using distinct rules for edge versus vertex staplesenabling the assignment of staple strand sequences assuming Watson-Crickbase complementarity (FIG. 1 , and FIGS. 3L-3M). Finally, the positionsand orientations of each nucleotide are modeled to predict the 3Dstructure of the nanoparticle (FIG. 1 , FIGS. 4A-4H and FIG. 13 ).Critically, in contrast with previous tile-based approaches thatemployed this DX-motif to synthesize nanoparticles of diverse form (He,Y et al., Nature. 452, 198-201 (2008)), the use of a single-stranded DNAscaffold that is routed throughout the entire object in the strategyoffers quantitative yield of the final product in its self-assembly, nodependence on relative multi-arm junction tile concentrations, and fullcontrol over DNA sequence. This final feature is essential tobiomolecular applications that utilize spatially specific asymmetricsequence programming for protein or RNA scaffolding as well as otherchemical functionalization.

To test the generality and robustness of the design procedure to beapplied to diverse polyhedral geometries, it was first applied to designPlatonic solids that have equal edge lengths, angles, and vertex-degree,followed by geometries of increasing complexity including Archimedeansolids with unequal vertex angles, Johnson solids that includeheterogeneity in vertex degree, and Catalan solids that have unequaledge lengths (FIGS. 2A and 2B). Applicability to asymmetric andnon-convex objects specified using surface geometry alone (Yan, H etal., Science. 301, 1882-1884 (2003)) was also confirmed (FIG. 1 , andFIG. 4 ), in addition to non-spherical topologies including a nestedcube, a nested octahedron, and tori that have not been realizedexperimentally (Zhang, F et al., Nat. Nanotechnol. 10, 779-784 (2015);He, Y et al., Angew. Chem. 122, 760-763 (2010)) and cannot be solvedcomputationally using existing procedures (Yan, H et al., Science. 301,1882-1884 (2003)). Taken together, these examples illustrate the broadability of the procedure to automatically generate complex scaffold andstaple routings for diverse geometries based on top-down geometricspecification alone (FIGS. 2A and 2B).

Example 2: Automated Inverse Design of Programmed DNA Assemblies withModified Geometry Methods and Materials

To create nanostructures having modified geometry at the edges andvertices, parameters specifying both the number of nucleic acid helicesand cross-sectional geometric pattern are input. A modified and enhancedprocedure related to the methods described in Example 1 is carried outto specify scaffold routing throughout the desired geometry at each edgeof the nanostructure. In addition, the geometry of each vertex can bedefined as being flat (i.e., non-beveled) or having beveled edges, toaccommodate the cross-sectional geometry of each edge as it enters andleaves the vertex.

Modifying Geometry

To route a single-stranded scaffold throughout the entire geometry withedges formed from 6 double helices arranged in honey-comb morphology (6HB edges), geometric manipulation to generate six-lines per each edge iscarried out. Each edge is detached from two vertices connected to it andits length is shortened (FIG. 14A) with the offset-distance (between thevertex and end point of the shortened edge) calculated by the anglebetween two adjacent edges at the vertex and the distance between twohelices where the origin of the local coordinate is located. After this,the connectivity of each separated line is newly constructed with twoconjugated endpoints. The number of separated points is twice the numberof separated lines regardless of the number of vertices. Then, at thecenter of every separated line, three vectors, t₁-t₂-t₃ as basis vectorsof the local coordinate system are introduced to determine the optimaltranslation and orientation of helices along the edges. The vector, t₁of the local coordinate system is defined by position vectors of twopoints from the connectivity of each separated line, which is parallelto each helical axis. The vector, t₂ is the average of two normalvectors that are perpendicular to each face interfaced with the localvector, t₁. The vector, t₃ is obtained by the cross product of twovectors t₁ and t₂.

Before converting each separated edge into six-lines modelingsix-duplexes, the cross-section of a honeycomb lattice shape is definedas six circles on two local vectors, t₃-t₂. To avoid clashing betweenneighboring duplexes, which occurs depending on the dihedral anglebetween two adjacent faces of the structure, two possible layouts areused; the bottom origin in which the origin of two local vectors islocated at the center of two bottom circles (hereafter called referencecircle) and the middle origin in which two middle circles are used asreference circles. The reference circle is only connected to the closestreference circle of the adjacent separated line sharing the same face.In an t₁-t₃ cross-section view of the honeycomb helix lattice for anyt₁-depth, each circle has diameter of 2.25 nm that is slightly largerthan diameter of the double-helical B-form DNA, 2 nm. The difference isto reflect the electrostatic repulsion between neighboring duplexeswhich may lead a somewhat large effective inter helical spacing 2.25 nm,as assumed previously (Dietz, H et al., Science 325, 725-30 (2009);Castro, C. E. et al., Nat. Methods 8, 221-229 (2011); Pan, K. et al.,Nat. Commun. 5, 5578 (2014)). The integer number (called a section ID)on each circle is assigned to model the asymmetric ends of DNA strands;the circle is the 5′-end having a terminal phosphate group when it is aneven number and the circle is 3′-end having a terminal hydroxyl groupwhen it is an odd number. Then, according to the section ID, eachseparated edge is replaced with six arrow lines with endpoints. Foreven-numbered circles, the arrow points in the same direction as t₁. Forodd-numbered circles, the arrow points in the opposite direction as t₁.Since the arrow will be substituted with scaffold running 5′-end to3′-end and staple running in 3′-end to 5′-end direction, each doublehelix has nearest neighbor where the strands have opposite parity. Thus,six antiparallel cross-linked DNA helixes are employed as the six-linesegments between every two adjacent vertices of the initial geometry.Subsequently, the minimum length line among all separated lines is foundand scaled (e.g., multiple of 21-bp greater than 42-bp edge length). Theminimum edge length allowed in this six-helix bundle design paradigm is42-bp since more than two double-crossovers between adjacent duplexesare guaranteed.

Finding Distributed Scaffold Double-Crossovers

The next step is to generate the loop-crossover structure (FIG. 14B).The endpoints of multiple line segments are joined (FIG. 14C) such thatevery line becomes part of a loop. The endpoint of the line segmentgenerated by the reference circle is connected with the closet endpointof the adjacent line segment from the reference circle (bottom origin inthis example). Four remaining endpoints are interconnected with eachother according to the following rule; two endpoints are diagonallyconnected in the bottom origin and horizontally connected in the middleorigin. Thus, the initial geometry consisted of N_(F) faces and N_(E)edges introduces N_(F)+2×N_(E) closed loops in which the N_(F) loops areoriginated from the connection by the red line and the 2×N_(E) loops aregenerated by connecting the endpoints by the blue line.

For each base-pair with the nucleic acid scaffold, the relativepositions and angles of each nucleotide is modeled to find the possiblecrossovers in both scaffold and staples. First, the length of linesegments is discretized as a multiple of base-pair lengths, 0.34 nm thatis the length of base-pair rise in the double-helical B-form DNA model.The certain number of base-pairs at the end of discretized linesegments, except for the line segments originated from the referencecircle, are added or deleted to find the nearest position of thescaffold crossover. For instance, in the case of the bottom origin,discretized lines from a section ID of 2 or 3 are moved 3-bp on the3′-end direction, and those from a section ID of 4 or 5 are moved 1-bpon the direction of the 5′-end to connect each other at the permittedposition of the scaffold crossover. For the middle origin, all of thediscretized lines except for those from a section ID of 3 are shifted by1-bp on the 3′-end direction, thus a cross-sectional shape is relativelyflat compared to one formed by the bottom origin. Given the helicalperiodicity with 10.5 nucleotide pairs per turn, there are two possiblestarting block, the prior and posterior block, for the base-pair at endof the 5′ which belong to the section ID of 1. With the block ID of 0,initial angle pointing the nucleotide of scaffold is defined as 270° forthe even section ID and 270° for the odd section ID when applying thebottom origin. The initial angle with the block ID of 0 is also definedas 30° for the even section ID and 210° for the odd section ID whenapplying the middle origin. Thus, when considering a section ID of 3 andthe bottom origin, the prior block with the ID of 3 and the poster blockwith ID of 13 have the angle of 167.1° (270°−360°×2/21×3) and 184.3°(270°−360°×2/21×13), respectively, which point almost the sameorientation that can be connected without any unpaired nucleotides ofthe scaffold. The base-pair with different starting blocks results inthe different patterns of scaffold and staple crossovers, which affectfinal sequence design even when applying the same staple-break rule. Weadapted and used the posterior block as the starting block in bothbottom and middle origin since it has more the 14-nt seed dsDNA domainswhose presence enhances folding yield (Ke, Y. et al., Chem. Sci. 3, 2587(2012); Martin, T. G. et al., Nat. Commun. 3, 1103 (2012)).

Scaffold double-crossovers are found and introduced by connectingbetween two closed loops (scaffold strand), creating the loop-crossoverstructure (FIG. 14C). Possible staple crossovers are restricted tointersections between the block and every third layer of a stack ofplanes orthogonal to the helical axes, spaced apart in intervals of 7-bpor two-thirds of a turn (Douglas, S. M. et al., Nucleic Acids Res. 37,5001-6 (2009)). Then, possible scaffold crossovers are permitted atpositions displaced upstream or downstream of the corresponding possiblestaple crossover points by 5-bp or a half-turn, except for those thatare 7-bp away from the both ends of the discretized line. Only twodouble-crossovers are selected per section 0 to 5 in a zig-zag patternwith respect to the center edge and the double-crossover connecting itsown loop is eliminated to avoid cycles. FIG. 14C shows scaffold loopswith initial double-crossovers (double line) of the tetrahedron DNAorigami structure with the 6HB as the edge, in which N_(F)+2×N_(E)−1double-crossovers among them are selected through the following scaffoldroute process.

Generating the Spanning Tree of the Dual Graph of the Loop-CrossoverStructure

In routing the single-stranded scaffold through the entire DNA origamistructure, the first requirement is to ensure an Eulerian circuit exists(Ellis-Monaghan, J. A. et al., Nat. Comput. 14, 491-503 (2015)). AnEulerian circuit, which is stricter than an Eulerian path, is requiredbecause the ends of the scaffold should be adjacent to create a singlescaffold nick. An Eulerian circuit is guaranteed when the degree ofevery vertex is even. This can be achieved by using an even number ofduplexes per edge in the structure; in this work, we have chosen to usesix duplexes per edge, each a six-helix bundle. Since the degree ofevery corner connected by the crossover and loop always remains two(even), it becomes an Eulerian circuit by choosing the proper number ofdouble-crossovers of the loop-crossover structure. Thus, the scaffoldrouting problem can be solved by computing a spanning tree of the dualgraph of the loop-crossover structure, which determines the propernumber of double-crossovers without any cycle that is a route of edgesand nodes wherein a node is reachable from itself. In order forloop-crossover structure that employs six duplexes per edge to be anEulerian circuit, N_(f)+2N_(e) closed loops should be connected to eachother by N_(f)+2N_(e)−1 double-crossovers. This implies that the edge isconstructed by six-duplexes with two or three scaffold double-crossoverswhich are determined by the spanning tree calculation.

In order to consistently select two or three double-crossovers for eachedge, the weight factor is assigned to each double-crossover with thevalue of 1 for two mandatory double-crossovers, the value of 2 for theoccasional double-crossover, and the value of 3 for the unwanteddouble-crossover. Despite having twelve ways to impose the weight factorof the double-crossover connecting two adjacent loops, we chose to adaptpattern #1 for the bottom origin and pattern #13 for the middle originsince the final staples with this pattern include more 14-nt seed dsDNAdomains.

Then, the dual graph of the loop-crossover structure is generated (FIG.14D), in which each loop becomes represented by a node and eachdouble-crossover becomes represented as an edge joining the analogousnodes and transferring the assigned weight factor of thedouble-crossover. Given the dual graph network, Prim's or Kruskal'salgorithm can be used to find the minimum weight spanning tree in whichN_(F)+2×N_(E)−1 edges are determined with the priority of small weightfactor of the edge. The edges that are members of the spanning treecorrespond to the subset of double-crossovers required to complete theEulerian circuit.

Inverting the Spanning Tree and Completing Scaffold Route

Once the spanning tree of the dual graph network has been determined,the graph is inverted back to the loop-crossover structure only usingmembers of the spanning tree (FIGS. 14E-14F). By choosing a particularsubset of double-crossovers in the loop-crossover structure, thesediscrete loops can be connected to form one continuous circular scaffoldthrough the entire structure. The direction of the circular scaffold isset to have the same direction defined by the corresponding section IDand the nick position is chosen to be placed on the duplex far fromcrossovers and staple nicks, which reveals the final scaffold routing.In the case of the design paradigm presented here, which employs sixduplexes per edge, one-single stranded scaffold can go through theentire structure and visit each edge of the initial geometry two timeswith every edge having two or three scaffold double-crossovers.

Adding Staple Strands and Sequences

In the next step, the staple strands wind in antiparallel directionaround the scaffold to assemble B-form double helices and the staplesequences can be computed based on complementary Watson-Crick basepairing with the scaffold sequence.

First, staple paths complementary to the scaffold are assigned by addingall permitted staple double-crossovers except for those that would benot 5-bp away from a scaffold crossover between the same two helices andnot 7-bp away from the both ends of discretized lines in the base pairmodel. The two staples crossing between edges are connected with acertain number of nucleotides with poly-T bulges where the staple pathsdo not bind to the scaffold, which serve as to prevent blunt-endstacking. Since a phosphate-phosphate distance of roughly 0.7 nm isknown as B-form DNA (Rich., Proc. Natl. Acad. Sci. U.S.A 95, 13999-14000(1998)), the number of unpaired nucleotides in the poly-T bulge iscalculated by dividing the spatial distance between two nucleotides tobe joined by 0.4 nm (a value slightly smaller than 0.7 nm is used toreduce the tension between the connection). Second, the initial staplepaths built from permitted staple crossovers and the poly-T bulge can benon-circularized after placing a nick at the center of the longest dsDNAdomain and where it is non-coincident with staple and scaffoldcrossovers. Lastly, the non-circular staple paths are broken intoshorter segments 20 to 60 nucleotides long, usually with a mean lengthof about 40 nucleotides. With design criteria of including at least one14-nt seed domain per each staple, we suggested and investigated twoalternative staple-break rules called “maximized staple length” and“maximized number of seed domains”.

Before applying the staple-break rule of the maximized staple length,the size of dsDNA domains of each initial staple are examined from5′-end to 3′-end by using a searching bar and the searching bar is placeat end of 5′-end of the initial staple to be segmented. The searchingbar continues to move to the center position of the next dsDNA domain inthe 5′-end direction until the distance traveled exceeds 60-bp length.Then it continues to move back to previous dsDNA domain in the 3′-enddirection until the domain located at the searching bar is longer thanor equal to 14-bp length. Finally, the backbone nick is placed at thecenter of the domain, which divides the initial staple into two. Theabove steps are repeated until the length of the remaining staple issmaller than 60-bp length. The algorithm does not consider the inclusionof the 14-nt seed domain for the staple to be cut, but guarantees the7-bp length as the minimum length of the dsDNA domain for the segmentedstaple.

For the staple-break rule of the maximized the number of seed domains,it is based on the previous suggested staple-break rule where backbonenicks are never placed in dsDNA domain longer than 7-bp and nicks arepositioned 3 and 4-bp away from crossovers in 7-bp domain. To apply theabove staple-break rule to our staple route design procedureautomatically, the searching bar that is initially placed at the end ofthe 5′-end continues to move to the next dsDNA domain until finding thedomain that is longer than or equal to the 14-bp length and the distancetraveled exceeds 20-bp length. Then the backbone nick is placed at thecenter position of the next dsDNA domain regardless its size. In theabove rule, the initial staple are broken by considering the presence ofthe 14-nt seed domain of the staple to be cut, so it is most likely tocontain more than one 14-nt seed domain per each staple. However, eachbroken staple has the potential to include the dsDNA domains with thesmall size since it does not consider the size of the domain to bebroken.

Note that each staple broken by the maximized staple length contains the14nt seed domain with more than 90% of total staples, which is a slightpercentage than when applying the maximized the number of seed domains.However, since it does not contain the small size domain that inducesweak Watson-Crick base paring, we adapted and used the maximized staplelength as the staple-break rule in the staple routing.

After all staples are attached and segmented, each staple is denoted bya vector of numbers, with each value corresponding to the scaffoldnucleotide to which it is base-paired. The input or generated scaffoldsequence is then used to match base identities (A, T, G, or C) to thecorresponding scaffold number assuming Watson-Crick base-pairing. If nosequence is provided, a segment of M13mp18 is used by default if therequired scaffold length is less than or equal to 7,249-nt, and asequence is randomly generated of the required length is greater.Finally, this list of staple sequences is output for synthesis (FIGS.14G-14I). Equivalent schematic representations outlining each of thenine steps for top-down sequence design procedure for scaffolded DNA ofarbitrary target geometry based on an open surface that is discretizedusing a polygon mesh are shown in FIGS. 15A-15I.

Nanoparticle Assembly

M13mp18 (NEB #N4040S; Bayou Biolabs #P-107) was incubated at 20 nM finalconcentration in a buffer containing 1×TAE (40 mM Trizma base, 20 mMglacial acetic acid, 1 mM EDTA) and 14 mM MgCl₂ and mixed with 800 nMfinal concentration staples (40 molar excess). The mix was annealed from95° C. to 22° C. over 24 hrs (95° C. 3 min; 90° C. 3 min; 85° C.-70° C.5 min at each temperature in 0.5° C. increments; 70° C.-22° C. 14 min ateach temperature in 0.5° C. increments.

Confirmation of Nanoparticle Assembly and Structural Validation

Assembled nanoparticles were mixed with loading buffer and ran on a 2%agarose gel with 1×TAE supplemented with 12 mM MgCl₂ and 1×INVITROGENSYBR®SAFE for 3 hrs at 70V at 4° C. and visualized under blue light. Thenanoparticle sample was compared against scaffold with a shiftindicating a properly folded particle.

Assembled nanoparticles were purified from the excess staples byfiltering on a pre-washed Amicon Ultra-0.5 mL centrifugal filter 100kMWCO spin filter and exchanged 5 times by spinning at 3,000 RPMs for 20minutes and re-suspended in clean buffer with MgCl₂. Purifiednanoparticles were visualized by electron microscopy stained with 2%uranyl acetate affixed to glow discharged carbon coated grids.

Results

The methods for top-down design of nanoparticles having edges formed ofsix double helices arranged in a honeycomb cross-sectional lattice wereapplied to produce DNA nanoparticles of different edge lengths andgeometries.

The following DNA nanostructures were designed with scaffold routedthrough 6-helix-bundle edge types and structurally validated accordingto the methods: Tetrahedron with 63 base pair edge length; tetrahedronwith 84 base pair edge lengths; octahedron with 84 base pair edgelengths; and tetrahedron, octahedron, and pentagonal bipyramidnanoparticles with 42 base pair edge lengths. In addition, a tetrahedronwith base pair edge length of 63 nucleotides was folded with a beveledvertex type and visualized according to the methods.

The structures were assembled and folded as visualized by the gel shiftmigration between the scaffold and the folded particle. The resultingnanoparticles were structurally characterized by transmission electronmicroscopy or cryo-electron microscopy, the electron micrographconfirming the structures were assembled and folded according to thedesign criteria.

Example 3: Synthetic Yield and Homogeneity of Self-Assembled DNA OrigamiObjects Methods and Materials

Materials

Chemicals

Tris-Acetate-EDTA buffer, MgCl₂, NaCl, TRIS-base, and nuclease-freewater were purchased from Sigma-Aldrich. The Zymoclean gel DNA recoverykit was purchased from Zymo Research, Inc. and the Amicon Ultra-0.5 mLcentrifugal filter (MWCO 100 kDa) from EMD Millipore, Corp. Restrictionenzymes and the DNA ladder (Quick-Load® Purple 2-Log DNA ladder 0.1-10kb) were provided by New England Biolabs, Inc. (NEB), the PCR enzyme(Accustart® Taq DNA polymerase HiFi) by Quanta Bioscience, Inc., lowmelt agarose by IBI Scientific, Inc., Seakem® agarose by Lonza Group,Ltd., and SYBR Green (10000×) by Thermo Fisher Scientific, Inc.

Oligonucleotides and DNA Templates

M13mp18 single-stranded DNA scaffold as well as the Lambda DNA wasprovided by NEB. (N4040S, N3011S). All oligonucleotides (for DNAassembly, asymmetric PCR, and scaffold digestion) and double strandedDNA gBlocks® were purchased from Integrated DNA Technologies, Inc. andused without further purification.

ssDNA Scaffold Synthesis

Single Stranded DNA Fragment of 200 to 6000 Nts Amplification Using aPCR

Asymmetric PCR amplification of ssDNA M13mp18: The asymmetric PCR wasperformed with a Mastercycler personal thermal cycler (Eppendorf, Inc.)using a sense primer concentration of 1 μM, an antisense primerconcentration of 20 nM and 30 ng of M13MP18 ssDNA template. PCR primerswere designed using Primer3 online software (v. 0.4.0) (Untergasser, Aet al., Nucleic Acids Res. 40, e115-e115 (2012); Koressaar, T et al.,Bioinformatics. 23, 1289-1291 (2007)) and are presented in Table 2. Themaximum yield was obtained using 1 unit of Accustart Taq DNA polymeraseHiFi in HiFi buffer complemented with 2 mM of magnesium sulfate and 200μM of dNTPs mix in a final volume of 50 μL. Single strand synthesis wasalso achieved by using standard Taq polymerase (Sigma Aldrich #D1806)using aPCR optimized protocol described for the Accustart Taq DNApolymerase HiFi and using the HiFi Buffer complemented with 2 mM ofMagnesium sulfate. The asymmetric PCR program used is as follows: 94°C., 1 min for the initial denaturation; followed by 30-40 cycles of 94°C., 20 sec; 55-57° C., 30 sec; 68° C., 1 min per kb to amplify. PCRproducts were run through 1% low melting temperature agarose gelpre-stained with EtBr, at 80 V for 1 h in TAE buffer. The ssDNA bandswere extracted and purified using Zymoclean Gel DNA recovery kit. ThessDNA concentration was estimated using the NanoDrop 2000 (Thermo FisherScientific, Inc.).

Single Stranded DNA Fragment Larger than 3000 Nts Amplification UsingaPCR

While AccuStart HiFi was capable of simple generation of ssDNA products,its processivity was limited to the amplification of large fragments.Initial tests with another Taq-based polymerases NEB LongAmp producednotable amounts of dsDNA byproduct while tested for amplification of the1000 nts and the 3281 nts fragments. However, these byproducts resolvedby increasing the annealing temperature. This enzyme was then tested foruse in long ssDNA synthesis. Phage λ genomic DNA (NEB) was used as atemplate for long-strand synthesis, with the protocol being onlyslightly modified, including using less template and increasing theextension time commensurate with the product length.

Results

With these optimizations, the LongAmp enzyme was capable of producingssDNA products 10 and 12 kb in length. Asymmetric PCR amplification ofdsDNA Lambda DNA for long ssDNA amplification: The asymmetric PCR wasperformed with a Mastercycler personal thermal cycler (Eppendorf, Inc.)using a sense primer concentration of 1 μM, an antisense primerconcentration of 20 nM and 0.5 ng of Lambda dsDNA template for the 10 kbfragments and 1 ng of Lambda dsDNA template for the 12 kb fragment(Table 3).

Results of the amplification of the two long ssDNA fragments (10 and 12kb) with the LongAmp [NEB] enzyme were confirmed by agarose gelelectrophoresis.

The maximum yield was obtained using 5 unit of LongAmp Taq DNApolymerase in LongAmp Taq reaction buffer complemented with 300 μM ofdNTPs mix in a final volume of 50 μL. The asymmetric PCR program used isas follows: 94° C., 30 sec for the initial denaturation; followed by20-35 cycles of 94° C., 30 sec; 56-60° C., 45 sec; 65° C., 50 sec per kbto amplify. PCR products were run through 0.7-0.8% low meltingtemperature agarose gel pre-stained with SybrSafe, at 70 V for 2 h inTAE buffer. The ssDNA bands were extracted and purified using ZymocleanGel DNA recovery kit. The ssDNA concentration was estimated using theNanoDrop 2000 (Thermo Fisher Scientific, Inc.).

Asymmetric PCR amplification of dsDNA GBLOCK®: GBLOCK® were prepared ata concentration of 10 ng/μL in Tris-EDTA buffer. PCR conditions andssDNA recovery methods were the same as those used for asymmetric PCR onssDNA plasmid. The hybridization temperature for each primer pair wasadjusted for each experiment.

TABLE 2Oligonucleotide primers and restriction enzymes use to generate ssDNAscaffold strands using aPCR or restriction enzyme digestion. 5′ 3′ SizeTemplate Name Method 5′-oligos/primers 3′-oligos/primers enz enz (b)M13mpl8 PCR 449 PCR GTCGTCGTCCCCTCAAACT ATTAATGCCGGAGAGGGTAG N/A N/A 449ssDNA (SEQ ID NO: 1) (SEQ ID NO: 8) M13mpl8 PCR 500 PCRGGACGCTATCCAGTCTAAACAT GAAAGAGGACAGATGAACGGTG N/A N/A 500 ssDNA(SEQ ID NO: 2) (SEQ ID NO: 9) Gblock PCR 721- PCR GTCGTCGTCCCCTCAAACTGCTGAAAAGGTGGCATCAAT N/A N/A 721 DsDNA 1 (SEQ ID NO: 1) (SEQ ID NO: 10)Gblock PCR 721- PCR GTCGTCGTCCCCTCAAACT GCTGAAAAGGTGGCATCAAT N/A N/A 721DsDNA 2 (SEQ ID NO: 1) (SEQ ID NO: 10) M13mpl8 PCR 738 PCRCTACCCTCGTTCCGATGCT GTTAATGCCCCCTGCCTATT N/A N/A 738 ssDNA(SEQ ID NO: 3) (SEQ ID NO: 11) M13mpl8 PCR 750 PCR CGCTTTCTTCCCTTCCTTTCTGGCGATTAAGTTGGGTAACGC N/A N/A 750 ssDNA (SEQ ID NOT) (SEQ ID NO: 12)M13mpl8 PCR 769 PCR GTCGTCGTCCCCTCAAACT GCTGAAAAGGTGGCATCAAT N/A N/A 769ssDNA (SEQ ID NO: 1) (SEQ ID NO: 10) M13mpl8 Frag 893 DigTGGAAAGCGCAGTCTCTGAATT GACTTTTTCATGAGGAAGTTTCC Bsp Alw 893 ssDNATACCG (SEQ ID NO: 5) (SEQ ID NO: 13) HI NI M13mpl8 PCR PCRGTCTCGCTGGTGAAAAGAAA ATTAATGCCGGAGAGGGTAG N/A N/A 1000 ssDNA 1000(SEQ ID NO: 6) (SEQ ID NO: 8) M13mpl8 PCR PCR CTCGGTGGCCTCACTGATTATGCTGCAAGGCGATTAAGTTGG N/A N/A 1000 ssDNA 1000-2 (SEQ ID NOT)(SEQ ID NO: 14) GBlock PCR PCR GTCGTCGTCCCCTCAAACTC GCTGAAAAGGTGGCATCAATN/A N/A 1087 dsDNA 1087-2 (SEQ ID NO: 15) (SEQ ID NO: 10) M13mpl8 PCRPCR TGCCTCAACCTCCTGTCAAT AGAGGCATTTTCGAGCCAGT N/A N/A 1447 ssDNA 1447(SEQ ID NO: 16) (SEQ ID NO: 24) M13mpl8 PCR PCR GGGCTTGCTATCCCTGAAAATGACTTGCGGGAGGTTTTGAAG N/A N/A 1500 ssDNA 1500 (SEQ ID NO: 17)(SEQ ID NO: 25) M13mpl8 PCR PCR GCGACGATTTACAGAAGCAAAAGGGCGAAAAACCGTCTAT N/A N/A 1616 ssDNA 1616 (SEQ ID NO: 18)(SEQ ID NO: 26) M13mpl8 Frag Dig TCGTCGCTATTAATTAATTTTCACGTGGACTCCAACGTCAAAGGGCG PacI drdI 1629 ssDNA 1629CCTTA (SEQ ID NO: 19) AA (SEQ ID NO: 27) M13mpl8 PCR PCRGATGAGTGCGGTACTTGGTTTA GCTTTGACGAGCACGTATAACG N/A N/A 2000 ssDNA 2000(SEQ ID NO: 20) (SEQ ID NO: 28) M13mpl8 PCR PCR CTACCCTCGTTCCGATGCTCGACGACAATAAACAACATGTTCAG N/A N/A 2298 ssDNA 2298 (SEQ ID NO: 3)CT (SEQ ID NO: 29) M13mpl8 PCR PCR CACGGTCGGTATTTCAAACCACCTCTTCGCTATTACGCCAGC N/A N/A 2500 ssDNA 2500 (SEQ ID NO: 21)(SEQ ID NO: 30) M13mpl8 PCR PCR GTCGTCGTCCCCTCAAACT GTTAATGCCCCCTGCCTATTN/A N/A 2805 ssDNA 2805 (SEQ ID NO: 1) (SEQ ID NO: 11) M13mpl8 PCR PCRGCACTGACCCCGTTAAAACTTA CAGATTCACCAGTCACACGACC N/A N/A 3000 ssDNA 3000(SEQ ID NO: 22) (SEQ ID NO: 31) M13mpl8 PCR PCR TCTTTGCCTTGCCTGTATGAGCTAACGAGCGTCTTTCCAG N/A N/A 3281 ssDNA 3281 (SEQ ID NO: 23)(SEQ ID NO: 32) M13mpl8 PCR PCR GTCTCGCTGGTGAAAAGAAAGTTAATGCCCCCTGCCTATT N/A N/A 3356 ssDNA 3356 (SEQ ID NO: 6)(SEQ ID NO: 11)

TABLE 3Oligonucleotide primers used to generate long ssDNA scaffold strandsusing aPCR protocol with LongAmp enzyme. emplat Name Method5′-oligos/primers 3′-oligos/primers Size (b) Lambda PCR PCRCAGTGCAGTGCTTGATAACAGG GTAGTGCGCGTTTGATTTCC 10,033 Phage 10 kb(SEQ ID NO: 33) (SEQ ID NO: 34) Lambda PCR PCR CAGTGCAGTGCTTGATAACAGGCCGAAGTCGAAATCAAGCTG 11,927 phage 12 kb (SEQ ID NO: 33) (SEQ ID NO: 35)

Single-Stranded DNA Fragment Digestion

The protocol used to cut M13mp18 ssDNA fragments with restrictionenzymes was adapted from Said et al., Nanoscale. 5, 284-290 (2013).Briefly, PCR tubes containing approximately 3.5 μg (1.5 pmoles) ofM13mp18 ssDNA and 10 molar equivalent of a pair of oligonucleotides(complementary to the two restriction site regions) in 50 μL of 1×NEBCUTSMART® buffer (50 mM potassium acetate, 20 mM Tris-acetate, 10 mMmagnesium acetate, 100 ug/mL BSA, pH 7.9) were annealed in a thermalcycler from 85° C. to 25° C. at a rate of 1° C. per min. 10 individualtubes were pooled and 100 units (10 μL) of each restriction enzyme wasadded directly to the mix. The mix was aged at 37° C. for 3 h. Afterincubation, each sample was concentrated to 50 μL using an AmiconUltra-0.5 mL centrifugal filter (MWCO 100 kDa), and run through a 1% lowmelting temperature agarose gel electrophoresis pre-stained with EtBr.Purification of ssDNA was performed with Zymoclean gel DNA recovery kit.Final ssDNA concentration was determined using the NanoDrop 2000.

aPCR Compared with Single-Stranded DNA Fragment Digestion

The aPCR method used here achieves higher quantities of scaffold with asmaller amount of starting material than the digestion of M13mp18 usingrestriction enzymes. Briefly, to obtain 3 pmoles of purified product, 5pmoles of M13mp18 are required using the digestion method while only0.012 pmoles are needed for aPCR amplification. With aPCR it is alsopossible to generate many different scaffold lengths, whereas digestionrelies on restriction site positions. Table 2 lists the primers used foraPCR amplification, which can be combined as desired to achieve adiverse array of scaffold sizes without generating new primers for eachcustom length. The final quantity produced in a 50 μL PCR reaction tubeis dependent on the fragment size and sequences, ranging between 1.5-4.5pmoles.

Folding of DNA Origami Objects

DNA Origami Assembly

DNA origami annealing reactions were realized in 50 μL reaction tubescontaining the different ssDNA scaffolds in a 5-40 nM concentrationrange diluted in Tris-Acetate EDTA-MgCl₂ buffer (40 mM Tris, 20 mMacetic acid, 2 mM EDTA, 12 mM MgCl₂, pH 8.0). To ensure correct foldingand to maximize yield, staple strand mixes were added in a 10-20× molarexcess. Annealing was performed in a Mastercycler personal thermalcycler (Eppendorf, Inc.) with the following program: 95° C. for 5 min,80-75° C. at 1° C. per 5 min, 75-30° C. at 1° C. per 15 min, and 30-25°C. at 1° C. per 10 min.

Characterization Methods

Agarose Gel Electrophoresis

Samples were loaded in 2% agarose gel in Tris-Acetate EDTA buffersupplemented with 12 mM MgCl₂ and pre-stained with EtBr. Gels were runon a BioRad electrophoresis unit at 4° C. for 3-4 h under a constantvoltage of 70 V. Gels were imaged using a Gene flash gel imager(Syngene, Inc.), and yield was estimated by analyzing the band intensitywith the Gel Analyzer program in the ImageJ software (Abramoff, M D etal., Biophotonics Int 11, 36-42 (2004)).

qPCR Thermal Analysis

qPCR analyses were performed in 384-well plate format using a RocheLightCycler® 480. A typical plate contained at least 3 replicates ofeach sample. Samples were complemented with 1× final concentration ofSYBR Green in a final volume of 20 μL. The scaffold concentration usedfor the tetrahedron analysis was 80 nM and the concentrations of eachstrand were adjusted to 1 μM for the three-way junction model. Theannealing protocol used was identical to the one used for DNA origamiassembly. SYBR® Green fluorescence was monitored over all experiments.Fluorescence curves obtained were analyzed using first-order derivativesto identify transition temperatures.

Results

To investigate the synthetic yield and homogeneity of self-assembledobjects programmed using the computationally-generated scaffold andstaple designs, asymmetric PCR (aPCR) (Wooddell, et al., Genome Res. 6,886-892 (1996)) was used to generate object-specific scaffolds forquantitative yield in folding (FIGS. 6A-6B). Agarose gel electrophoresisindicated that digestion of M13mp18 by restriction enzyme produced1629-b and 893-b fragments, and aPCR amplified a 721-b ssDNA strand.These custom scaffolds minimize excess single-stranded DNA in the finalstructure, which may result in non-specific object aggregation orotherwise interfere with folding as well as downstream chemicalfunctionalization (FIGS. 7A-7E, and FIGS. 8A-8B) (Rothemund, Nature.,440, 297-302 (2006); Dunn, K E et al., Nature. 525, 82-86 (2015);Sobczak, J P J et al., Science. 338, 1458-1461 (2012)).

Monodispersity of multiple custom linear short scaffold strandssynthesized ranging from 450 to 3,400 nucleotides were first confirmedusing gel electrophoresis of aPCR products based either on the M13pm18ssDNA plasmid or dsDNA fragments as templates. Custom scaffolds wereused to fold tetrahedra of 31-, 42-, 52-, 63-, and 73-bp edge lengths inaddition to an octahedron, two pentagonal bipyramids (42- and 52-bp edgelengths), a cube, a reinforced cube, an icosahedron, and acuboctahedron. Agarose gel electrophoresis confirmed their high foldingyield of approximately 60-90% (Table 4) and particle homogeneity that ischaracteristic of scaffolded DNA origami objects. Importantly,application of this aPCR approach offers folded sample purity that issimilar to existing synthesis strategies that utilize restrictionenzymes to generate sub-fragment scaffolds (FIGS. 7A-7E) (Said, H etal., Nanoscale. 5, 284-290 (2013)), yet without dependence onrestriction sites and with higher synthetic yield.

Redesign of vertex staple nicks to be positioned at crossovers insteadof interior segments of duplexes also resulted in increased foldingstability (FIGS. 9A-9H). Thus, diverse polyhedral origami objects from200 kDa to 1 MDa programmed using this top-down, inverse sequence designprocedure self-assembled robustly using scaffolds of custom length andsequence, which may be natural or synthetic.

TABLE 4 Folding yield from 2% agarose gel electrophoresis for thedifferent DNA objects. DNA origami Edge length Folding yield on objects(bp) agarose gel (%) Tetrahedron 31 90 Tetrahedron 42 78 Tetrahedron 5293 Tetrahedron 63 91 Tetrahedron 73 91 Octahedron 52 72 Cube 52 60Reinforced cube 52 84 Cuboctahedron 52 88 Icosahedron 52 74 Pentagonal42 87 bipyramid Pentagonal 52 83 bipyramid

Example 4: Verification of Structural Fidelity of Programmed DNA Origami

DNA nanoparticles of expected sizes and shapes were used to generate 3Ddensity maps, and the structures of nanoparticles were validated usingcryo-EM. Structural data for nanostructures designed and folded usingthe described methods are available at the EMDB (electron microscopydatabank), as accession numbers EMD-3408 (Tetrahedron); EMD-3409(Icosahedron); EMD-3410 (Octahedron); EMD-3411 (Cuboctahedron); EMD-3412(Reinforced cube); and EMD-3413 (Nested cube).

Importantly, cryo-EM reconstructions confirmed that origami objectsassembled as designed instead of “inside-out” while satisfyingprogrammed Watson-Crick base pairing from sequence design. This resultreaffirms the suitability of the sequence design formula to choose topoint the major groove inwards at vertices, which was based on theprevious observation that DNA origami folds in this manner (He, Y etal., Angew. Chem. 122, 760-763 (2010)).

Example 5: Stability of DNA Origami Assembly in Various Salt ConditionsMethods and Materials

Buffer Exchange and Stability Experiments

Buffer Exchange

DNA origami objects were folded in TAE-Mg buffer (12 mM MgCl₂) andwashed one time with TAE-Mg (12 mM MgCl₂) buffer using Amicon Ultra-0.5mL centrifugal filter (MWCO 100 kDa) and subsequently washed three timeswith the new stability buffer (TAE, PBS, or DMEM+FBS).

Stability Experiments

The stabilities of DNA origami objects in TAE, PBS, or DMEM FluoroBrite(0.35% BSA, 1% Penicillin/Streptomycin, 1% L-Glutamine) buffercomplemented with 2% dialyzed fetal bovine serum (dFBS) or 10% FBS wereevaluated for 6 h.

Results

An important limitation of DNA origami for biological as well as invitro applications has been the requirement of high concentrations ofeither magnesium or monovalent cations for their folding and stability(Martin, T G et al., Nat. Commun. 3, 1103 (2012); Sobczak, J P J et al.,Science. 338, 1458-1461 (2012)), which was recently shown to bealleviated by the use of single-duplex edge mesh-works that fold and arestable in physiological buffer and salt conditions (Yan, H et al.,Science. 301, 1882-1884 (2003)). Folding of 52-bp edge-length pentagonalbipyramid in increasing magnesium chloride (MgCl₂) concentrationsincluding 1 mM, 2 mM, 4 mM, 6 mM, 8 mM, 12 mM, 16 mM, 20 mM, 30 mM; andincreasing sodium chloride (NaCl) concentrations including 10 mM, 20 mM,50 mM, 100 mM, 150 mM, 200 mM, 500 mM, 1 M, and 2 M, was characterizedby 2% agarose gel electrophoresis. Similar analysis was conducted onfolding of 63-bp edge-length DNA tetrahedron. Stability of the 52-bpedge-length pentagonal bipyramid, after being folded in TAE-Mg (12 mMMgCl₂) buffer, followed by buffer exchange for 6 hours in PBS, TAE(without added NaCl or MgCl₂), or DMEM buffer with increasingconcentration of FBS (0, 2, and 10%), was characterized using 2% AGE.Stability was observed for structures in PBS buffer but not in theabsence of salt in TAE, which clearly demonstrates the importance ofminimal salt concentration for stability. While degradation is observedfor structures in DMEM media in the presence of 2 to 10% FBS, thepresence of intact objects was detected after 6 hours. In summary,investigation of the folding properties of these synthesized DX-basedobjects revealed that objects fold effectively in cation concentrationsas low as 4 mM Mg²⁺ and 500 mM Na⁺ (FIG. 10A), as well as in PBS alone(FIG. 10B). The use of DX-arms and multi-way junctions here uniquelyresults in structurally stable, rigid assemblies that are crucial tomost applications (Rothemund, P W, Nanotechnology: Science andComputation. 3-21 (2006); Yan, H et al., Science. 301, 1882-1884(2003)).

To test the utility of these objects for cellular assays, post-foldingin TAE-MgCl₂ particles were transferred to PBS and Dulbecco's ModifiedEagle's Medium (DMEM) containing 0 to 10% FBS, where they were found tobe stable for at least six hours (FIG. 10B). When transferred tosalt-free solution post-folding, however, particles were observed to beunstable, confirming the importance of a minimal amount of added saltfor their longer-term stability (FIG. 10B). Interestingly, folding ofthe 52-bp tetrahedron using the full M13 scaffold instead of its customscaffold also required higher magnesium concentrations to achievefolding, reaffirming the importance of the use of scaffolds thateliminate excess in single-stranded DNA (FIGS. 9A-9B).

As a major advance in this direction, a top-down, geometry-drivensequence design procedure is developed that uses a spanning tree formulato determine scaffold crossover positions, which enables efficient andunique routing of the single-stranded scaffold throughout the targetorigami object, as well as automated staple assignment for customsynthesis of programmed origami objects of quantitative yield and highfidelity atomic-level structure. Asymmetric PCR provides full controlover scaffold sequence and length, and use of the DX-based designfurther confers folding capacity and stability under diverse conditionsincluding cell-compatible buffers. Combined, this strategy to realizethe top-down design of nanoscale DNA assemblies offers full control overboth 3D structure and local sequence which, together with the broadlyusable software and experimental protocols, provides a versatileapproach to the design of functionalized DNA objects of nearly arbitraryshape for numerous applications in biomolecular science andnanotechnology including nanoparticle delivery (Bhatia, D et al., Angew.Chem. Int. Ed. 48, 4134-4137 (2009); Douglas, S M et al., Science. 335,831-834 (2012)), photonic applications (Sun, W et al., Science. 346,1258361 (2014); Kuzyk, A et al., Nature. 483, 311-314 (2012)) thatinclude self-assembled super-lattices (Liu, W et al., Science. 351,582-586 (2016)), inorganic nanoparticle synthesis (Sun, W et al.,Science. 346, 1258361 (2014)), memory storage (Church, G M et al.,Science. 337, 1628-1628 (2012)), and single-particle cryo-EM analysis(He, Y et al., Nature. 452, 198-201 (2008); Bai, X et al., Proc. Natl.Acad. Sci. 109, 20012-20017 (2012); Wang, Z et al., Nat. Commun. 5, 4808(2014); Irobalieva, R N et al., Nat. Commun. 6, 8440 (2015)) forproteins and RNAs that are not otherwise amenable to crystallography orNMR, amongst other applications (Jones, M R et al., Science. 347,1260901 (2015)). The ability to synthesize nearly arbitrary geometricshapes that are automatically rendered from the top-down should enablethe broad participation of non-experts in this powerful molecular designparadigm.

Example 6: Single Stranded DNA Fragment Amplification Using ModifieddNTPs Methods and Materials

Amplification of Single Stranded DNA Fragments with Modified dNTPs

The methods for production of single-stranded nucleic acid scaffoldsequences using APCR can be adapted to incorporate modified dNTPs, forexample, for the production of nanoparticles including custom-designedmodifications to nucleic acids.

Asymmetric PCR (aPCR) amplification with Accustart HiFi was used toamplify ssDNA fragment using various percentage of modified dNTPs. EachdNTP was prepared separately at a concentration of 100 mM, and mixedprior to amplification at the correct ratio.

dUTPs were used in a 1 to 100% range to replace dTTP. Cy5-dCTP was usedin a range of 1-10%. Alpha-phosphate-dNTP was used from 1 to 100% toreplace all the four non-modified bases.

The protocol used for amplification is the same as for normal fragmentamplification with Accustart HiFi. Exemplary synthesis of ssDNA withAPCR using modified dNTPs for nanoparticle folding was carried out byamplification of a 1,000 nts fragments with a different percentages ofthitriphosphate-modified dNTPs (i.e., ranging from 0%-100%alpha-phosphate dNTPs).

Protection and Stability Assay

Nanoparticles formed using staples modified at their 3′ and 5′ ends byphosphorothioate were assembled, purified, and incubated in differentpercentages (0%, 2%, 5% and 10%, respectively) mouse serum to assessdegradation.

The results were visualized using gel electrophoresis of nanoparticlesin equal starting molar amounts. A reduction in the intensity of thebands on the gel was indicative of nuclease protection.

Results

A single-stranded nucleic acid of 1,000 nts in length was amplifiedusing asymmetric polymerase chain reaction (aPCR) fragments withdifferent percentage of alpha-phosphate dNTPs. Bands at 1,000 nts werevisualized at the corresponding molecular weight on an agarose gel forAPCR products prepared to incorporate 0% (control) 10%, 20%, 40%, 50%and 75% thiotriphosphate-modified dNTPs.

Use of dUTP at different percentages (90% total, 95% total and 100%total) to replace dTTP for amplification of a 1,000 nts ssDNA fragmentswith aPCR was also confirmed by visualizing of products having thecorresponding molecular weight on an agarose gel.

Production of fluorescent ssDNA fragments of 2,000 nts by incorporatingdifferent concentrations of Cy-5 dCTP was also demonstrated by aPCR (seeFIG. 11 ).

Specifically, Cy5-modified dNTPs were incorporated into the 2,000 ntscaffold strand at concentrations ranging from 0.5%-10% Cy5, including0.5%, 1%, 2%, 5%, and 10%, confirmed by visualizing of products havingthe corresponding molecular weight on an agarose gel and also byfluorescence spectroscopy of the resulting nucleic acid sequence. TheCy5-modified scaffold nucleic acid including 10% Cy5 was folded into apolyhedral shape according to the described methods. Folded fluorescentnucleic acid nanostructures were visualized as a gel-shifted band asvisualized on an agarose gel.

The protection assay indicated that nanoparticles incorporatingphosphorothioate-modified nucleic acids were less-prone to exonucleasedigestion in mouse serum than those that did not in incorporate themodified nucleic acids.

Example 7: Conjugation of Molecules to Nucleic Acid NanostructuresMethods and Materials

Experiments to demonstrate the use of nucleic acid nanostructures forthe capture of other molecules were also carried out. In one experiment,an RNA (mRNA encoding mCherry protein, transcribed by T7 RNA polymeraseand acrylamide gel purified) molecule was fixed to a tetrahedralnanostructure with 63 base pair edge length using single strand DNAoverhangs extended from the staples at nick positions, with the sequenceof the overhang complementary to predicted loops in the RNA structure(depicted in FIG. 16A).

The conjugation of nucleic acid nanostructures to target molecules wasconfirmed by gel electrophoresis, and the identity of the bound targetmaterial was also validated using cryo-electron microscopy (cryo-EM).

In a second experiment, the CRISPR enzyme Cpf1 with crRNA was capturedonto a DNA nanoparticle, by conjugation to a sequence on a crossbeamstructure built into the nanoparticle. The capture was mediated by anucleic acid sequence targeted by the Cpf1/crisprRNA enzyme (depicted inFIG. 16 B). The DNA template for the crRNA was synthesized bycomplementary oligonucleotide completion and purification followed bytranscription using T7 RNA polymerase using the MegaShortScripttranscript kit (Invitrogen). The crRNAs were purified using acrylamidegel purification followed by column purification (Qiagen RNAeasy kit).ALT-R CRISPR-Cpf1 (IDT) was incubated with equimolar concentration ofthe crRNA of interest. Two Cpf1 targets for EGFP were generated(CCTGGTCGAGCTGGACGGCGACG (SEQ ID NO:36); CTGAGCACCCAGTCCGCCCTGAG (SEQ IDNO:37)), and those same sequences were placed in the center region ofthe crossbeam across the tetrahedron with 84 base pair edge length. Thepre-incubated Cpf1/crRNA complex was then added in 1.5 molar excess to40 nM tetrahedron-crossbeam nanoparticles and incubated for 30 minutesat 37° C. and immediately placed on ice. The resulting complex was ranon a 2% high-resolution agarose gel stained with INVITROGEN SYBR®SAFE at65V for 4 hrs at 4° C. Band shift due to complexed protein was indicatedby less migration through the gel. Subsequently the gel was stained inCoomassie blue and destained in 10% acetic acid 30% ethanol. Co-stainingindicated comigration of protein with the nanoparticle.

In a third experiment, the CRISPR enzyme Cpf1 with crRNA was capturedonto a DNA nanoparticles by conjugation onto an overhang sequence builtinto the nanoparticle, which contains a sequence complementary to a 3′extension of the crRNA (depicted in FIG. 16 C). As above, the crRNA wastranscribed using T7 RNA polymerase MegaShortScript (Invitrogen) withadditional sequences 3′ of the crRNA and purified as before. The RNA wascomplexed to the Cpf1 by incubation at 37° C. and annealed to atetrahedron with 84 base pair edge length with one staple extended 3′with a single stranded DNA overhang reverse-complementary to the crRNAoverhang. The Cpf1/crRNA was added in 1.5 molar equivalents to 40 nMtetrahedron and annealed from 42° C. to 22° C. over 1 hr then placed onice. The sample was then loaded to a 2% high resolution agarose gel andran and stained as described above.

Results

The RNA bound to the nanoparticle was seen by an increase in molecularweight using gel electrophoresis, as a slower migration inducing a shiftof the band on the gel. The bound RNA was also validated usingcryo-electron microscopy.

ALT-R™ CRISPR-Cpf1 with crRNA targeting a sequence in EGFP was attachedto a crossbeam containing 20 nucleotides of that target sequence (thusthe reverse complement of the targeting crRNA sequence), was indicatedby the induced gel shift in the corresponding lane, as compared withboth substrate molecules alone. Further validation of the bindingbetween ALT-R™ CRISPR-Cpf1 with crRNA and the nanostructure was observedthrough co-localization of the protein material when stained withCoomassie blue dye.

Alt-R CRISPR-Cpf1 with crRNA targeting a sequence in EGFP with a 3′sequence extension of 14 nucleotides was attached to an overhangcontaining a complementary sequence of 14 nucleotides, as determined bythe induced gel shift in the bound lane. Further validation of thebinding is seen through co-localization of protein material when stainedwith Coomassie blue.

Accordingly, conjugation of nanostructures with target moleculesproduced by top-down design was confirmed.

Example 8: Scaffolded RNA Nanostructures Methods and Materials

Nucleic acid nanostructures incorporating RNA as the single-strandedscaffold were designed and produced according to the described methodsfor top-down design for DX staple structures. The methods used togenerate DNA-scaffolded DX-tile nanoparticles with two helices per edgewere applied to design and produce a RNA-scaffolded tetrahedron with 66base pairs edge length. The same scaffold routing procedure was used,with the edges modified to extend to the multiple of 11 base pairs.Similarly, an RNA-scaffolded octahedron with 44 base pairs edge lengthwas also generated.

The RNA scaffold was synthesized using a template generated from a 1058nucleotide segment of the M13mp18 DNA with an additional T7 promoteradded to the 5′ primer. T7 RNA polymerase was used to synthesize the RNAwhich was gel purified from polyacrylamide and column purified (QiagenRNAeasy kit). The purified RNA was mixed with 20-fold excess staples ina buffer composed of 100 mM HEPES-NaOH and 200 mM NaCl and slowlyannealed over 24 hours.

Folding and assembly of the nanoparticles were determined by gelelectrophoresis on 2.5% high resolution agarose in 1×Tris-borate-EDTAsupplemented with 2.5 mM MgCl₂ and ran at 65V for 3 hours at 4° C. Theband was compared against single-stranded RNA scaffold alone. Thestructure was using an Amicon 100k MWCO spin filtration column spun at3000 RPMs for 20 minutes and buffer exchanged by returning to originalvolume 5 times. The particles were imaged using transmission electronmicroscopy by fixing to carbon grids, drying, and negative staining with2% uranyl acetate.

Results

The RNA nanostructures were assembled and folded, as visualized by thegel shift migration between the scaffold and the folded particle.

The resulting nanoparticles were structurally characterized bytransmission electron microscopy, the electron micrograph confirming thestructures were assembled and folded according to the design criteria.

This example demonstrated that the methods can be applied to RNAnanoparticles.

Example 9: HIV-1 RNA Genome Fragments Captured on DNA Nanostructures

To overcome the major gap in the knowledge of 3D structures of viralRNAs, a novel technical platform is implemented for the high-throughputand high-resolution determination of the 3D structure of RNAs, withapplication to the HIV-1 RNA genome. Application to the HIV-1 genomestructure offers basic insight into the general principles of RNAfolding, with future potential also for programming RNA structures forbiotechnological applications.

The methods developed for determination of the 3D structure of RNAs aregenerally applicable to solving any 3D RNA structures, with downstreamapplication to diverse viral genomes in addition to messenger RNAs, longnon-coding RNAs, as well as other important classes of RNAs that play acentral role in biology and disease. This structural knowledge isessential to understanding the diverse biological functions of RNAsincluding messaging, splicing and modification, protein interactions,translation regulation, catalysis, and genetic inheritance.

Methods and Materials

T7 RNA polymerase was used to transcribe RNA from double-stranded DNAtemplates encoding for the 5′UTR and the RRE. These constructs had a 5′T7 RNA polymerase promoter followed directly into the sequence.Transcription was done using the T7 megascript kit (Invitrogen). RNA wasthen purified on 8% polyacrylamide gel by separating contaminates andextraction by diffusion into 300 mM Sodium acetate. The RNA was theprecipitated in 70% ice cold ethanol and placed at −20° C. overnight.The precipitant was pelleted in a Epindorf centrifuge running at 14,000rpms for 30 minutes. The pellet was re-suspended in H₂O. The RNA wasthen further cleaned by use of a QIAGEN RNA purification kit. Prior tobinding, the RNA was heated to 65° C. for 3 minutes and then put on ice.5× concentrated folding buffer (250 mM HEPES pH 7.6, 750 mM NaCl, 37.5mM MgCl₂) was added to 1× final, and the RNA was placed at 37° C. for 1hour.

DNA oligonucleotides were purchased from IDT that had complementarysequences to the targeted loops and an additional set of nucleotidesthat are complementary to a second biotinylated strand. The two DNAoligonucleotides were annealed in 50 mM HEPES, 150 mM NaCl, and 7.5 mMMgCl₂ by heating to 95° C. and cooling stepwise to 25° C. over 1 hour.The annealed strands were then affixed to streptavidin coated magneticbeads (NEB). Excess duplex strands were removed by washing. RNA was thenadded and incubated with the beads coated in duplex DNA containing aregion of single-stranded capture or bait sequence with RNA in 5-foldmolar excess, with incubation at 37° C. for 10-15 minutes followed byincubation at room temperature for 10-15 minutes. Excess RNA was washed.The beads containing bound duplex DNA and bound RNA were then exchangedto water and DNA and RNA was removed by incubation at 65° C. for 5minutes, and beads pulled down by magnets. The eluate was then run on adenaturing polyacrylamide gel and RNA was visualized by staining withINVITROGEN SYBR®SAFE.

Tetrahedra were assembled with non-functionalized staples except for 3staples, which were replaced by functionalized staple with either a 5′or 3′ locked nucleic acid sequences that are complementary to the RNAsingle strand loop of the 5′TAR and one strand was replaced in bothtetrahedral by a functionalized strand with a biotin moiety. Thescaffold sequence was incubated with 10-fold excess of staple strandsand annealed over 14 hours from 95° C. to 25° C. The tetrahedra werepurified by Amicon 100 kd MWCO spin filters and buffer exchanged to 50mM HEPES pH 7.6, 150 mM NaCl, and 7.5 mM MgCl2. In different wells thetetrahedra were incubated with streptavidin coated magnetic beads (NEB)and then brought down with a magnet and washed four times. The foldedRNA was then incubated with the tetrahedra-bead system for 20 minutes at37° C., 20 minutes at 30° C. and ˜30s at room temperature. The beadswere brought down by a magnet and subsequently washed 5 times. Thetetrahedra and RNA were released by bringing the beads up in water andheating at 65° C. The released solution was ran on a denaturingpolyacrylamide gel and visualized in UV after staining with INVITROGENSYBR®SAFE.

Results

RNA capture experiments using sequence-specific ssDNA overhangs to bindthe target RNA towards solving structures are successfully carried out.DNA fragments were amplified from the HIV-1 genome plasmids p83-2 andp83-10 sequences from the NIH AIDS Reagent Program. DNA fragmentsencoding the whole 5′-UTR (nucleotides 1-346; FIGS. 5A-5F) and anengineered version of the 5′UTR with the long single-stranded primerbinding site replaced by a short tetraloop, as used in NMR studies(Keane, S C et al., Science. 348, 917-921 (2015)), were amplified with a5′ T7 RNA polymerase promoter, in addition to the Rev Response Element(RRE). RNA was then transcribed with the T7 RNA polymerase, gelpurified, and refolded in a previously published buffer (Watts, J M etal., Nature. 460, 711-716 (2009)). The polyacrylamide gel verified RNAproduction of full-length 5′-UTR (WT), 5′-UTR with the primer bindingsite truncated to a tetraloop (PBS), with the RRE used for sizecomparison. The transcribed, folded RNA was subjected to bead-basedbinding assays against immobilized DNA. A biotin-DNA duplex strandscontaining single-stranded overhang complementary to the single-strandedloops of the RNA, as judged by previously published SHAPE analysis fromthe Weeks lab (Watts, J M et al., Nature. 460, 711-716 (2009);Siegfried, N A et al., Nature Methods. 11, 959-965 (2014)) were testedfor their ability to bind RNA. The DNA duplexes were bound to beads andthen RNA was incubated with the bait-duplex covered beads. These werethen washed, and any bound RNA was released by heating and loading to adenaturing urea polyacrylamide gel. Positive capture was seen for 4different targets out of the 5 tested overhang sequences, including theTAR loop, the unpaired 5′-pseudoknot, a loop near the packaging signal,and the gag/pol Kozak sequence. The bait sequence complementary to thedimerization site sequence was unable to capture the RNA, likely due todimerization blocking the binding.

Two tetrahedra of length 63 bp were assembled that each had a singlestrand overhang targeting the TAR loop incorporated into them inopposite orientations. These tetrahedra were additionally built toincorporate a biotin moiety on a separate staple strand overhang,opposite in space from the bait sequence. Streptavidin bead-basedcapture was then used to capture the tetrahedra and this was used forsubsequent RNA binding. Release of RNA by heating, and loading to apolyacrylamide gel showed the tetrahedron was able to capture the RNA inboth orientations of the overhang sequence targeting the TAR loop.

A platform to capture large RNA fragments using DNA nanostructures hasbeen established. Assembling multiple bait sequences onto thenanostructures will allow capture of diverse RNA fragments.

1-3. (canceled)
 4. The polyhedral nucleic acid nanostructure of claim27, wherein each edge of the nanostructure comprises four or moreparallel or anti-parallel helices arranged in square cross-sectionalmorphology, or six or more parallel or anti-parallel helices arranged inhoneycomb lattice morphology.
 5. The polyhedral nucleic acidnanostructure of claim 4, wherein each vertex of the nanostructure hastwo or more edges that come together in an aligned angle to create abevel at the vertex.
 6. The method of claim 27, wherein the geometricshape does not have spherical topology. 7.-9. (canceled)
 10. Thepolyhedral nucleic acid nanostructure of claim 27, wherein thecrossovers are anti-parallel crossovers, wherein the target nucleic acidnanostructure comprises B-form nucleic acid helices along the edges,wherein the length of each edge is expressed as a multiple of 10.5 basepairs rounded up or down to the nearest whole number, and wherein thelength of each edge is between 21 base pairs and 1,000 base pairs,inclusive.
 11. The polyhedral nucleic acid nanostructure of claim 27,wherein the crossovers are anti-parallel crossovers, wherein the targetnucleic acid nanostructure comprises A-form nucleic acid helices alongthe edge, wherein the length of each edge is expressed as a multiple of11 base pairs, and wherein the length of each edge is between 22 basepairs and 1,100 base pairs, inclusive.
 12. The polyhedral nucleic acidnanostructure of claim 27, wherein the crossovers are parallelcrossovers, and wherein the target nucleic acid nanostructure comprisesA-form or B-form nucleic acid helices along the edges, and wherein thelength of each edge is between 20 base pairs and 1,100 base pairs,inclusive.
 13. The polyhedral nucleic acid nanostructure of claim 27,wherein the target nucleic acid nanostructure comprises A-form or B-formnucleic acid helices along the edge, and wherein the length of each edgeis 21 base pairs, 31 base pairs, 42 base pairs, 52 base pairs, 63 basepairs, or 73 base pairs.
 14. The polyhedral nucleic acid nanostructureof claim 27, wherein the nucleic acid scaffold is RNA, the staplestrands are RNA or the scaffold and staple strands are RNA, and whereinthe length of each edge is 44 base pairs, 55 base pairs, 66 base pairs,or 77 base pairs.
 15. (canceled)
 16. The polyhedral nucleic acidnanostructure of claim 27, wherein identifying a route for asingle-stranded nucleic acid scaffold that traces throughout thegeometric shape in step (b) further comprises: classifying each edge ofthe network based on its membership in the spanning tree, wherein edgesthat are members of the spanning tree do not have a scaffold doublecrossover, and edges that are not members of the spanning tree have ascaffold double crossover; splitting each edge that is not a member ofthe spanning tree into two edges, each containing a pseudo-node at thepoint of the scaffold crossover; splitting each node at each of thevertices into two pseudo-nodes; and wherein the Euler cycle representsthe route of a single-stranded nucleic acid scaffold that traces oncealong each edge in both directions throughout the entire geometricshape.
 17. The polyhedral nucleic acid nanostructure of claim 27,wherein identifying a route for a single-stranded nucleic acid scaffoldthat traces throughout the geometric shape in step (b) furthercomprises: classifying each edge of the network as one of four typesbased on its membership in the spanning tree and on whether it employsanti-parallel or parallel crossovers; edges that are members of thespanning tree have each scaffold portion start and end at differentvertices, and edges that are not members of the spanning tree have eachscaffold portion start and end at the same vertex splitting each edgethat is not a member of the spanning tree into two edges, eachcontaining a pseudo-node at the point of the scaffold crossover;splitting each node at each of the vertices into two pseudo-nodes; andwherein the Euler cycle represents the route of a single-strandednucleic acid scaffold by superimposing and connecting units of partialscaffold routing within an edge based on its classification and length.18. The polyhedral nucleic acid nanostructure of claim 27, whereinidentifying a route for a single-stranded nucleic acid scaffold thattraces throughout the geometric shape in step (b) further comprises:rendering each helix in the network as a line, based on the targetcross-section of each edge; calculating a loop-crossover structure,wherein two or more adjacent lines are connected to form loops and allpossible double-crossover locations between two loops are calculated;calculating a dual graph of the loop-crossover structure, wherein theloops and double-crossover locations of the network are converted tonodes and edges of the dual graph, respectively, wherein the spanningtree is of the dual graph network; calculating which of the locations ofdouble-strand crossovers will be used, wherein a single double-strandcrossover is placed at each edge that is the part of the spanning treeof the dual graph; and calculating the Euler cycle of the network,wherein the Euler cycle represents the route of a single-strandednucleic acid scaffold that traces once through each duplex throughoutthe entire geometric shape.
 19. The polyhedral nucleic acidnanostructure of claim 27, wherein the spanning tree of the network isdetermined using a breadth-first search or depth-first search.
 20. Thepolyhedral nucleic acid nanostructure of claim 19, wherein the spanningtree is calculated using Prim's formula or Kruskal's formula.
 21. Thepolyhedral nucleic acid nanostructure of claim 27, wherein the Eulercycle is the A-trail Euler cycle.
 22. The polyhedral nucleic acidnanostructure of claim 27, wherein the geometric shape is a 3Dpolyhedral. 23.-26. (canceled)
 27. A polyhedral nucleic acidnanostructure designed according to a method of comprising: (a)providing or determining the geometric parameters of an input, whereinthe input comprises a 3D polyhedral or 2D polygonal geometric shape of atarget nucleic acid nanostructure; (b) identifying a route for asingle-stranded nucleic acid scaffold that traces throughout thegeometric shape comprising: rendering a wireframe mesh of the geometricshape as a node-edge network; using a spanning tree of the node-edgenetwork to define the placement of scaffold crossovers; and determininga Euler cycle, wherein the single-stranded nucleic acid scaffold tracesthe Euler cycle throughout the node-edge network; (c) assembling thetarget nucleic acid nanostructure having the 3D polyhedral or 2Dpolygonal geometric shape comprising hybridizing the single-strandednucleic acid scaffold to itself and/or staple strands to form the targetnucleic acid nanostructure designed according to steps (a) and (b). 28.A three-dimensional polyhedral nucleic acid nanostructure comprising twonucleic acid anti-parallel helices spanning each edge of thenanostructure, wherein the three-dimensional structure is formed fromsingle-stranded nucleic acid staple strands hybridized to asingle-stranded nucleic acid scaffold, wherein the scaffold is routedthrough the Euler cycle of the network defined by vertices and lines ofa node-edge network of the polyhedral nanostructure, wherein the nucleicacid nanostructure comprises at least one edge including a double-strandcrossover, wherein the location of the double-strand crossover isdetermined by the spanning tree of the node-edge network of thepolyhedral nanostructure, and wherein the staple strands are hybridizedto the vertices, edges and double strand crossovers of the scaffold todefine the shape of the nanostructure.
 29. A three-dimensionalpolyhedral nucleic acid nanostructure comprising two nucleic acidparallel helices spanning each edge of the nanostructure, wherein thethree-dimensional structure is formed from a single-stranded nucleicacid scaffold hybridized to itself and may also hybridize tosingle-stranded nucleic acid staple strands, wherein the scaffold isrouted through the Euler cycle of the network defined by vertices andlines of a node-edge network of the polyhedral nanostructure, whereinthe scaffold hybridizes to itself in at least one edge using parallelcrossovers, and wherein the staple strands, if any, are hybridized tothe edges and double strand crossovers of the scaffold to define theshape of the nanostructure.
 30. A three-dimensional polyhedral orpolygonal nucleic acid nanostructure comprising four or more nucleicacid anti-parallel helices spanning each edge of the nanostructure,wherein the three-dimensional structure is formed from thesingle-stranded nucleic acid staple sequences hybridized to asingle-stranded nucleic acid scaffold sequence, wherein the scaffold isrouted through the Euler cycle of the network defined by vertices andlines of a node-edge network of the polyhedral nanostructure, whereinthe nucleic acid nanostructure comprises at least one edge including adouble strand crossover, wherein the location of the double strandcrossover is determined by a spanning tree of the dual graph of thenetwork of the polyhedral or polygonal nanostructure, wherein thehelices comprising an edge are arranged as a square lattice of four ormore helices, or honeycomb lattice of six or more helices, wherein thehelices meeting at a vertex are beveled or non-beveled, and wherein thestaple strands are hybridized to the vertices, edges and double strandcrossovers of the scaffold to define the shape of the nanostructure. 31.The polyhedral nucleic acid nanostructure of claim 27, furthercomprising a molecule selected from the group consisting of PNA,protein, lipid, carbohydrate, a small-molecule, a dye, and RNA, whereinthe molecule is covalently or non-covalently bound to the nanostructure.32. The polyhedral nucleic acid nanostructure of claim 27, furthercomprising a therapeutic, diagnostic or prophylactic agent.
 33. A methodof using the polyhedral nucleic acid nanostructure of claim 32 for thedelivery of the therapeutic, diagnostic or prophylactic agent to asubject, the method comprising the step of administering thenanostructure to the subject. 34.-38. (canceled)
 39. The polyhedralnucleic acid nanostructure of claim 27, wherein the scaffold comprisesdeoxyribonucleic acid selected from the group consisting ofnaturally-occurring dNTP, dUTP, fluorescent dNTP, alpha-phosphate dNTP,radioactive dNTP, and polyethylene glycol-modified dNTP, or combinationsthereof. 40.-41. (canceled)
 42. The nucleic acid nanostructure of claim28, wherein single-stranded or double-stranded nucleic acid overhangsequences extend from nick positions from the oligonucleotide staplestrands.
 43. The nucleic acid nanostructure of claim 42, wherein thenucleic acid overhang sequences that extend from nick positions from theoligonucleotide staple strands form duplex reinforcements along one ormore edges of the structure, or span between two vertices of thestructure.
 44. The nucleic acid nanostructure of claim 42, wherein thesingle-stranded or double-stranded nucleic acid overhangs comprise oneor more sequences of nucleic acids that is complementary to a target RNAor DNA sequence.
 45. The nucleic acid nanostructure of claim 42, whereinthe single-stranded or double-stranded nucleic acid overhangs compriseone or more sequences of nucleic acids that interact with DNA bindingproteins or RNA-binding proteins.
 46. The nucleic acid nanostructure ofclaim 42, wherein the edge length and nanoparticle geometry is greaterthan the size of the target molecule that is to be captured, to allowfor 1, 2, 3, or more than 3 molecules to be bound independently of anyother.
 47. The nanostructure of claim 27, wherein the nanostructure hasa molecular weight of between 200 Daltons and 100 mega Dalton,inclusive.
 48. The nanostructure of claim 27, wherein the nanostructurecomprises a single-stranded scaffold sequence including one or morenucleic acid sequences complementary to a nucleic acid sequencecorresponding to one or more of an mRNA, DNA, or an epitope recognizedby a DNA binding protein.